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Abstract 

Additive regression models have a long history in multivariate nonparametric regression. 

They provide a model in which each regression function depends only on a single explana¬ 
tory variable allowing to obtain estimators at the optimal univariate rate. Beyond backfitting, 
marginal integration is a common procedure to estimate each component. In this paper, we 
propose a robust estimator of the additive components which combines local polynomials on the 
component to be estimated and marginal integration. The proposed estimators are consistent 
and asymptotically normally distributed. A simulation study allows to show the advantage 
of the proposal over the classical one when outliers are present in the responses, leading to 
estimators with good robustness and efficiency properties. 
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1 Introduction 


Several authors have dealt with the dimensionality reduction problem in non-parametric regression 
models. In particular, additive models allow the modelling of a response y as a sum of smooth 
functions of individual covariates X = (Xi,..., The advantage of additive models over 

general non-parametric regression models is that they allow to circumvent the so-called curse of 
dimensionality, which is caused by the fact that the expected number of observations in local 
neighbourhoods decreases exponentially as a function of the dimension d of the covariates. More 
precisely, Stone (1985) defined the curse of dimensionality as “being that the amount of data 
required to avoid an unacceptably large variance increases rapidly with increasing dimensionality”. 
This results in the poor convergence rate of the estimators which, as it is well known, depends 
exponentially on the dimension and on the degree of smoothness of the regression function. To 
be more precise, let (X^,y) be a random vector where T G M is the dependent variable and 
X G is the vector of explanatory variables. Consider the non-parametric regression model 
Y = g(X.) + cr(X)e where the error e is independent of X and centered at zero and —>• M is 

the function to be estimated. Stone (1980, 1982) showed that the optimal rate for estimating g is 
j.j-C( 2 £+d) I is i ;]20 degree of of smoothness of g. 

To face this problem. Stone (1985) and Hastie and Tibshirani (1990) considered additive mod¬ 
els which generalize linear models, solve the problem of the curse of dimensionality and provide 
easily interpretable models. Additive models assume that g'(x) = p + Yl'j=i9ji^j) where fi is the 
location parameter and the additive components gj : M ^ M satisfy some additional condition to 
be identifiable such as Kgj{Xj) = 0. One of the advantages of additive models is that they allow for 
independent interpretation of the effect of each variable on the regression function g, as in linear 
regression models. Besides, as shown by Stone (1985), for such regression models the optimal rate 
for estimating g is the one-dimensional rate of convergence leading to dimensionality 

reduction through additive modelling. 

Several estimation procedures to £t additive models have been proposed in the literature. The 
iterative method called backhtting proposed by Buja, Hastie and Tibshirani (1989) and Hastie and 
Tibshirani (1990) is one of the most popular procedures. Even if the procedure converges quickly, 
its iterative nature makes difficult to analyse its statistical properties. Besides, the backhtting 
algorithm does not allow to estimate derivatives since it does not give a closed form for the estimator. 
On the other hand, the marginal integration procedure proposed by Tjpstheim and Auestad (1994) 
and Linton and Nielsen (1995) and generalized by Chen et al. (1996) allows for the derivation of 
a closed form for the estimator and has been shown to work very well in simulation studies, see 
Sperlich et al. (1999). In particular, Severance-Lossin and Sperlich (1999) combine the integration 
procedure with a local polynomial approach to estimate simultaneously the additive components 
and its derivatives. When hrst moments exist, the idea beyond marginal integration is to estimate 
the marginal effects dehned as the expectation of Y with respect to the random error e and all the 
covariates except the Xq, which is hxed. The marginal effect says how Y varies in average when Xq, 
varies. If the true multivariate function g is additive, the marginal effects match with the additive 
components ga, except for the constant p, allowing for precise estimations under an additive model. 
The estimators are obtained estimating, in a hrst step, the multivariate function g and then, using 
the marginal integration procedure to obtain the marginal effects. 

As in other non-parametric settings, the estimators obtained through marginal integration can 
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be seriously affected by a relatively small proportion of atypical observations if the smoother chosen 
to estimate the multivariate function g is not resistant to outliers in the response variable. As is 
well known, in a non-parametric framework outlying observations can be even more dangerous 
than in a parametric model, since extreme points affect the scale and the shape of any estimate of 
the regression function based on local averaging, leading to possible wrong conclusions. This has 
motivated the interest in combining the ideas of robustness with those of smoothed regression, to 
develop procedures which will be resistant to deviations from the central model in non-parametric 
regression models. In this paper, we go further and we focus on robust estimators for additive models 
leading to reliable non-parametric regression estimators when atypical responses arise and which 
attain a univariate rate of convergence. Indeed, we seek for consistent estimators of the regression 
function g without requiring moment conditions on the errors Ei so as to include the well-known a- 
contaminated neighbourhood for the errors distribution. More precisely, in a robust framework, one 
looks for procedures that remain valid when ~ To G Ja = {G : G[y) = (1 - q;)Go(2 /) + aH{y)}, 
with H any symmetric distribution and Gq a central model with possible first or second moments. 
No moment conditions are required to the errors so that outliers correspond to deviations on the 
errors distribution. 

In this framework, some resistant procedures for additive models based on M—smoothers have 
been considered previously in the literature. Bianco and Boente (1998) considered robust estimators 
for additive models using kernel regression. Their approach, which is a robust version of that 
considered in Baek and Wehrly (1993), has the drawback of assuming that Y—gj(Xj) is independent 
from Xj, which is difficult to justify or verify in practice. Robust estimators based on backfitting and 
penalized splines M—estimators have been proposed for generalized additive models by Alimadad 
and Salibian-Barrera (2012) and Wong et al. (2014). In the particular case of the non-parametric 
regression model Y = g{X) + cj(X)e, with g'(x) = g,-\- the procedures considered in 

Alimadad and Salibian-Barrera (2012) and Wong et al. (2014) assume that the scale function is 
known. For generalized additive models with nuisance parameters, Croux et al. (2011) provides 
a robust fit using penalized splines, while recently Boente et al. (2015) combines the backfitting 
algorithm with robust univariate scale equivariant smoothers to provide robust estimators under 
an additive model with unknown scale. However, up to our knowledge, except for the estimators 
considered in Bianco and Boente (1998), the asymptotic distribution of the estimators mentioned 
above has not been obtained. 

On the other hand, Li et al. (2012) introduced robust estimators of the additive components 
gj using local linear regression and marginal integration and derived their asymptotic behaviour. 
Besides assuming that the scale is known, the main disadvantage of the procedure defined in Li 
et al. (2012) is that the estimators solve the curse of dimensionality only when d < 4, since the 
local multivariate polynomial considered is of order one. This effect has also been described for the 
classical estimators, based on a local least squares approach, by Hengartner and Sperlich (2005) 
and Kong et al. (2010) who noted that to solve the curse of dimensionality the order of the local 
polynomial approximation should increase with the dimension of the covariates, leading to a higher 
numerical complexity. To avoid this problem, Severance-Lossin and Sperlich (1999) modihed the 
initial estimators used in the integration procedure, using higher order kernels and local polynomials 
that depend only on the covariate Xj related to the th additive component to be estimated. 

In this paper, we introduce robust estimators of the additive components using local polynomi¬ 
als on the component to be estimated and marginal integration. In this sense, our approach can 
be viewed as a robust version of the estimators defined in Severance-Lossin and Sperlich (1999). 
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Besides, our proposal allows to provide also robust estimators of the derivatives of the marginal 
components. Taking into account that in some studies, specially in many biological situations, 
missing responses may arise, we will provide a unified approach for complete data sets and for 
data sets in which responses are missing at random. The rest of the paper is organized as follows. 
Section 2 introduce the family of estimators to be considered. Consistency results and the asymp¬ 
totic distribution are derived in Sections 3 and 4, respectively. Finally, the results of a numerical 
experiment conducted to evaluate the performance of the proposed procedure with respect to its 
classical counterpart defined in Severance-Lossin and Sperlich (1999) are reported in Section 5. 
Proofs relegated to the Appendix. 


2 The estimators 

We will consider robust inference with an incomplete data set 1 < i < n, where 

= 1 if is observed and hj = 0 if is missing. Let (X"^,y, 5)^ be a random vector with 
the same distribution as (Xj,Yi, 6i)^ and assume that (X"^,y)"^ satisfies the additive model Y = 
+ gj{Xj) + a(X.)e , where the error e is independent of X with symmetric distribution Fq{-), 
that is, we assume that the error’s scale equals 1 to identify the scale function. Hence, when second 
moments exist, we have that E(y|X) = g(X) = g + Yl'j=i 9ji^j) is 

the conditional variance function. Some additional conditions to be discussed below on the marginal 
components need to be require in order to guarantee identifiability. 

Our aim is to estimate the non-parametric regression components gj and its derivatives in 
a robust way with the data set at hand. An ignorable missing mechanism will be imposed by 
assuming that 6 and Y are conditionally independent given X, i.e., 

p(5 = i|y,x) =p(5 = i|x) =p(x). ( 1 ) 

To define the conditions needed for identifiability, we begin by fixing some notation. We will 
partition Xj and x into a scalar and a {d — 1)— dimensional sub-vectors. To avoid burden notation, 
we denote Xj = (Aj,a, and x = (xq,,x)^)^, respectively where Xa and x^ are the directions of 

interest and not of interest, respectively. As in Linton and Nielsen (1995) and Nielsen and Linton 
(1998), let Q he a given probability measure with density g(x). Denote as qa{x) dx = dQa{x) and 
qadx.a = dQa{xg) where Qa stands for the a-th marginal of the measure Q and Qa corresponds to 
the marginal of Xq,. Prom now on, the additive components will be identified using the condition 

J gaix)qa{x) dx = 0 for all a = l,...,d. ( 2 ) 

In particular, when q = fx the density of X, equation (2) corresponds to Kga{Xa) = 0 for a = 
l,...,d. However, to define the estimators we assume that the marginal qa is known and so 
the choice g = /x is not a valid one. It is worth noting that the location parameter g, equals 
f g(x)dQ(x), so it can be estimated with a root—n rate of convergence using a preliminary regression 
estimator. For that reason, throughout this paper, we assume that g = 0, i.e., f g(x)dQ(x) = 0. 
Hence, the model to be considered throughout this paper is 

d 

y = j;5,(A,)+a(X)e, (3) 

i=i 
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where the error e is independent of X and has a symmetric distribution Fq. 

The estimators to be dehned are based on initial local polynomial M—estimators of order q 
for the regression function g, where the polynomial to be considered is expanded only on the 
component of interest. More precisely, if we are interested in estimating ga{x) the a-th additive 
component, the estimator to be considered treat differently the covariate which corresponds to 
the direction of interest and the other ones, calculating a robust local polynomial of order q only 
on the a—th direction. As for the estimators introduced in Severance-Lossin and Sperlich (1999), 
the use of higher order kernels will allow to obtain resistant estimators of the additive component 
which achieve the optimal univariate rate of convergence. 

To reduce the effect of outliers on the regression estimates, we replace the square loss function 
in Severance-Lossin and Sperlich (1999) by a function p with bounded derivative. Usually, the 
loss function depends on a tuning constant c allowing to achieve a given efficiency, so that it can 
be written as p{u) = Pc{u) = (P'pxiujc). Typical choices for the loss function are the Huber-loss 
function dehned as pi{u) = p^ {u) = (u^/2)I|„|<i + (|u| — l/2)I|„|>i otherwise. The Tukey’s loss 
dehned as pi{u) = Pt:{u) = min (3u^ — + u®, l) provides an example of bounded loss function. 

The bounded derivative of the loss function controls the effect of outlying values in the responses. 
As it is well known, to obtain robust scale invariant estimators, the residuals must be standardized 
using a robust scale estimator. For that reason, from now on, ^(x) stands for a preliminary robust 
consistent scale estimator which can be taken, for instance, as the local mad dehned in Boente 
and Fraiman (1989). If the additive model has homoscedastic errors, i.e., if (t(x) = a for all x, the 
estimator S'(x) = s to be considered below can be dehned as the mad of the residuals obtained with 
a simple and robust nonparametric regression estimator, such as the local median. 

Assume that we are interested in estimating g^ which is assumed to be a continuously differ¬ 
entiable function up to order q. Denote as ga\xa) the derivative of order v of the component ga 
and let /3^“^(x) = /3(x) = {g{yL), g'^{xa), ■ ■ ■, ga\xa) / q)^ ■ An estimator of /3^“^(x) can be dehned 
as the value f3 (x) = /3(x) = (/3o (x), /3i (x),..., /3q(x))^ such that 


3^“^(x) = 3(x) 


n 

argmin Sj JCh^ (Xj 


x)p 


Y^- 


/3o + E?=i/3,(^.. 


s(x) 



( 4 ) 


with /CHd(Xi~x) = (det(H(i))“^/C(H^^(Xj —x)), /C(x) = 0^=1 with Xj ; M —)• M univariate 

kernels and = diag(/ii,..., hq) is diagonal bandwidth matrix. When there is no confusion, we 
will avoid the superscript (a) to avoid burden notation. 

The preliminary estimator of the regression function g{'x) denoted „ (x) is dehned as g^q „ (^) ~ 
/3o(x), where the letter M indicates that we are using a local M—estimator and the subscripts “g, a” 
indicate the order of the local polynomial used on the a-th component of x. 

Finally, the robust estimator of the a-th component is obtained through the marginal integration 
procedure as 

9a,Mq,o. (®a) = j9Mq,c (Xa, Ua)qai^g) du^ = jP{Xa, ng_)qa{^a) dUa (5) 

where ej £ is the vector with its j-th coordinate equals 1 and the other ones equal 0. Moreover, 
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an estimator of the derivative of order v, 1 < q oi \s given by 

9allq,c (Xa) = 1^1 du{Xa, Ua)ga(UQ) =1^1 ^iXa, Ua)ga(Ua) du^ 


Finally, the robust estimator of the multivariate regression function g is dehned as 

d 


ffMg^a(^) ~ y ^ 9a, Mg,a (Xg) . 


ol =1 


When /z 7 ^ 0, in the expressions of the marginal component estimators an estimator fi of g 
should be substracted in order to obtain consistent estimators, that is, the estimator of g^ equals 
9a,Mq,a{xa) = J9uq,a{xa,^a)qa{'^a)dua — g-, while the estimator of the multivariate regression 
function g is g^g „ (x) = ^+X]a=i a (xa)- A possible choice for Jl is to compute a robust location 
estimator d of the residuals f9Mg,a UQ)gQ(ua) du^ and to define g, = —d/ (d—1). The 

practitioner may also choose as location estimator g = (1/d) J2j=i dj where gj = fgug j (u) dQ{\i). 
However, this estimator does not necessary have a root—n order of convergence, a fact which has 
already been mentioned by Sperlich et al. (1999) for the classical estimators. 

^ (d) 

It is worth noting that when p is continuously differentiable with derivative p' = ip, (3 (x) 

satisfies the following system of equations 

^n,a(3^'^Vx),X,s(x)) = Orf+i , (6) 


where T'„,a(/3, x, cj) = (T„,q,,o(/ 3, x, tj),..., T„,Q,,g(/3, x, cj))'^ and is defined for £ = 

0,1, • • • , d as 


^n,«,£(/3,X,cr) = 


n 

E 

i=l 


5i fCua (Xj - x) z/) 


'>i-/3o-ELi/5i(Xi„-x„)A 


a 


{^ia ^d) 


3 Consistency 

In this section, we will show that the estimators dehned in Section 2 are strongly consistent. Recall 
that the preliminary local M—estimator based on local polynomials of order q, 0(^)5 is adapted 
to the additive component a we want to estimate. Hence, we hx a = 1,..., d and to derive strong 
consistency results for the estimator of the additive component ga, we state the conditions adapted 
to the choice of a. The kernels to be used are also adapted to this framework. However, in order 
to allow more hexibility, we will not restrict the bandwidth choice to = hn and hj^n = for 
j ^ a allowing different bandwidths for each component. 

In what follows, C stands for any compact set and for any function m ; —)• M we denote as 
i{m) = infxGC rn{'x). Let 1 < a < d be hxed and denote as = f uPt^/C(u) du = f Ka{u)du, 
0 < i, j < q with u = {ui, ..., Ud)^■ The following set of assumptions will be needed. 

AO The product measure Q has compact support Sq contained in the support Sj of /x. 
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Ai (xT,y„ 1 < * < n are i.i.d. vectors satisfying (1). Moreover, (X^, Yi)^ fulfil the additive 
model (3) where the functions ga verify (2). 

A2 The density function, /x(x), of X and the missingness probability p (x) are bounded over the 
compact C C Sf and such that i{p) > 0. i(/x) > 0. Moreover, p and /x are continuous in a 
neighbourhood of C. 

A3 cj(x) and g{x) are continuous functions of x in a neighbourhood of C and i{a) > 0. 

A4 a) For all j = 1,... ,d, the marginal component gj is continuously differentiable in a neigh¬ 
bourhood of the support, Sj, of the density of Xj with derivative g'j = g^^^ bounded, 
b) ga is ((? + 1)—times continuously differentiable. 

A5 a) The kernel function /C : ^ M is such that /C(x) = 11^=1 where Kj : M —?■ M 

have bounded support, say [—1,1] and f Kj{u)du = 1. Besides, Kj : [—1,1] —> M are 
even, bounded functions and Lipschitz continuous of order one. 

b) The matrix = {S^^) i<i,j<q+i is positive definite, where 5^“^ = for 1 < 

j,A;,< q+1. 

A6 The bandwidth sequences are such that hj^n —^ 0 and /ij^„/logn —>■ oo. 

A7 The function p is an even and three times continuously differentiable function with bounded 
derivatives V’ = i’’ and xjj". Furthermore, E('0'(e)) > 0 and C{u) = ui/j'^u) and C 2 {u) = 

uijj"{u) are bounded. 

A8 The scale estimator s'(-) satisfies that sup^gc l^(^) “ 0- 

Remark 3.1. Assumptions A3 to A6 are standard conditions to derive consistency results in 
nonparametric regression models. Assumption Al establishes that the model is an additive one 
where the components are identifiable. On the other hand, AO is a standard condition when using 
marginal integration procedures. It is worth noting that A2 implies that some response variables 
are observed for all x € C, which is a common assumption in the literature of missing data. Note 
that A5 implies that = 1 and s\‘^j = 0 if i -|- j is odd. Assumptions Al and A7 imply that 
= 0 for any u > 0. Assumption A7 is a standard condition on the score function when 
local polynomials and scale estimators are considered. Finally, A8 requires uniform consistency 
of the preliminary scale estimator which is needed to derive uniform consistency of the initial 
regression function. Note that A4 entails that the derivative of ga of order q + 1, ga^^\ is bounded 
in Sa. 

Remark 3.2. It is easy to see that A3 and A8 imply that the robust scale estimator has up¬ 
per and lower uniform bounds almost surely. More precisely, if A = infxgc <7(x)/2 and B = 
(3/2) supxgc (t(x) we have that 

P (3 no such that for all n > no and for all x S C A < s"(x) < B) = 1 . (7) 

On the other hand, if we denote as d„{x.) = it(x)/s'(x), A3 and A8 imply 

sup [a( 7 (x) — 1| 0 . (8) 

xGC 
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From now on, we denote as the diagonal matrix given by = diag(l, ha, /i^,..., ha)- 

Proposition 3.1. Let C <Z Sf he a compact set such that A2 is satisfied. Assume that A1 to A8 
hold. Then, there exists a solution l3{x) of (6) such that sup^gc “/3(x)}|| 0 where 

/3(x) = {g{x),g^a\xa),...,ga\xa))'^ and gi^^ = g'a- 

Theorem 3.1 shows the consistency of the marginal integration estimator of the regression 
function and its derivatives when using local polynomials of order q in the direction a. We omit 
the proof of Theorem 3.1 since it follows straightforwardly from Proposition 3.1 using similar 
arguments to those considered in the proof of Theorem 3.2.3 in Boente and Martinez (2015). 

Theorem 3.1. Assume that AO to A8 hold with C = Sq C Sf and some fixed a. Denote as Ca 
the support of Qa • Then, we have that 

a) sup,^GC„ - 9a{x)\ 0, 

bj SUPxgc^a \9all,j,Ax) - g^a\x)\ ^ 0. 

Furthermore, if for any a = AO to A8 hold for the kernels used to define the a—th 

additive component estimators and C = Sq, then sup^gc (x) - 5(x)| —4 0, where 5 m, (x) = 

= l 9j,Mq^ai^j)- 

4 Asymptotic distribution 

In this section, we derive the asymptotic distribution of the a—th additive component estimator. As 
in Severance-Lossin and Sperlich (1999), we will assume that to compute the preliminary estimator 
guq^a (x) the diagonal bandwidth matrix is such that its a—th diagonal element equals ha and 
the remaining ones are h, i.e., we assume that hj = h, for j ^ a. Moreover, we will consider 
two different univariate even and bounded kernels, K and L. The kernel K is positive and used 
over the a—th coordinate of x, i.e., Ka = K. On the other hand, the kernel L is used on the 
remaining components of x, that is, Kj = L, for j ^ a. Furthermore, to obtain a univariate rate 
of convergence for the a—th additive component estimator L will be chosen as a kernel of order 
£ > 2, that is, f L(u) du = 1, f u^L{u) du = 0, for s = 1,..., f — 1 and f u^L{u) du ^ 0. Clearly, the 
choice of kernel and bandwidth as well as the computation of the preliminary estimator need to be 
done for each additive component to be estimated, making the method computationally expensive. 
Thus, to gain in convergence rate some numerical complexity seems to be necessary. 

Throughout this section, we will assume an homoscedastic model, that is, (t(x) = o" so that the 
additive model can be written as K = X^j=i5j(4fj) + ae where the error e is independent of X 
and has a symmetric distribution Fq with scale 1, so as to identify a. We will also assume that a 
robust root—n convergent scale estimator s' of cj is available. 

Due to the kernels choice and the homoscedasticity assumption, assumptions Al, A4 and A5 
will be replaced by the following ones. 


Ni (xT,y„ 1 < i < n are i.i.d. vectors satisfying (1). Moreover, (Xj-,Yi)^ are such that 
Vi = ffj (Vij )+a£ where the errors £i are independent of X* with symmetric distribution 

Fq and the functions ffa verify (2). 

N2 For all j = 1,... ,d and J ^ a, the marginal component gj is ^ times continuously differentiable 

l£\ 

in a neighbourhood of the support Sj of the density Xj and is bounded. Besides, ga is 
continuously differentiable until order q + 1 and the derivative q + l, ga^^\ is bounded in Sa- 

N3 a) The kernel function /C : —)■ M is such that Kj = L for j ^ a. Moreover, K and 

L are bounded, even, compactly supported and Lipschitz continuous with J K{u) du = 
JL{u) du = 1. Without loss of generality, we assume that the support of K and L is 
[- 1 , 1 ]- 

b) The kernel is such that the matrix 8^“^ = (J Ka{u) dxi )defined in A5b) 
is non-singular. 

c) The kernel L is a kernel of order I > 2, that is, jL{u)du = 1, fu^L(u)du = 0 if 
1 < J < ^ — 1 and f u^L{u) du / 0. 

~ 1 

N4 The bandwidth sequences hj = hj^n > 0 are such that hj^n = hn ^ 0 for j ^ a, ha = fin 29 + 3 . 
Moreover, h = hn is such that h = oln ^( 29 + 3 ) j and n'^i+^h /logn —>• 00 . 

N5 The function <?a(u) is continuous and the functions /x(u) and p(u) are continuously differen¬ 
tiable up to order £. Furthermore, sup^g^g /x(x) < 00 , infxgc /x(x) > 0 and infxGcp(x) > 0, 
where C C Sf stands for some compact neighbourhood of 

Assumptions N2 to N4 correspond to assumptions A3, Al and A2 in Severance-Lossin and 
Sperlich (1999), respectively. Note that the order i of the kernel L is an even number, since L is an 
even function. Also, notice that N2 implies that supx;g 5 g |5'(x)| < 00 . The proof of the asymptotic 
distribution of the preliminary estimators guqai^) can be found in Martinez (2014). 

Denote as A(a) = Ei/;(ei -|- a) and Ai(a) = K'if'{ei + a). Given a symmetric matrix A G 
z 2 i(A) < • • • < r'm(A) stand for the eigenvalues of A. 


Theorem 4.1. Assume that AO, A2, A7 and Nl to N5 hold and that the function A(a) has 
bounded Lipschitz continuous derivatives up to order i — 1, in a neighbourhood of 0. Let s be a 
consistent estimator of a such that ^/n(s — a) = Op(l). Let x be an interior point of Sf and /3(x) 
be a solution of (6) with s(x) = 's, for all x, such that sup^g^g [/3(x) — /3(x)]|| 0, where 

/3(x) = {g{-x.),g^\xa), ■.. ,ga\xa)/q'-T and g^^'^ = g'a- Then, we have that 


y/nha [9a,Mq^a (^a) 9a{Xa)] ^ {pq,a{Xa) ^'XqaiXa)^ 


where 


l^q,a{Xa) 

^q,af^oi) 


,29+3 1 


= a 


el (S(“)) s(“) , 

dxQ ) ei(S(“))"^Va(S(“))"^ei , 


(9 + 1)! 

EV'2(e) ( f qU^g) 


[EV''(e)]^ \J /x(a;a,x«)p(3:«,x„) 
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withs'f'^ = (4“i\ ■ ■ ■ wheres^^] = f dt for j = 1,... ,q+l and V„ = (vim') 

with vim = f u^~^^~^K^(u)du. 


Remark 4.1. It is worth noting that as in other nonparametric settings, the asymptotic bias does 
not depend on the score function. Moreover, the score function appears in the asymptotic variance 
through the quantity 

iwW 

which is similar to that given in the location setting. Hence, to calibrate the estimators to attain 
a given efficiency it is enough to choose the same tuning constant as in a location model. 



Assume that the smoothing parameter h in the directions not of interest is such that h = yn 


<7 + 1 \ 


g+l 


Then,/i = o j if and only if r > (g+l)/(£(2g+3)). On the other hand, n2<2+3^/logn 

oo when r < (q + l)/((2q + 3)(d — 1)). Hence, bandwidth rate of h must satisfy 

q+l q+l 


i{2q + 3) 


< r < 


(2g + 3)(d- 1) ’ 


(9) 


which implies that the practitioner must choose a kernel L with order at least the dimension of the 
covariates, i.e., i> d. 


Theorem 4.2. Assume that AO, A2, A7 and N1 to N5 hold and that the function A(a) 
has bounded Lipschitz continuous derivatives up to order i — I, in a neighbourhood of 0. Let 
s be a consistent estimator of a such that y/ni^ — a) = Op(l). Let x be an interior point of 
Sf and /3(x) be a solution of (6) such that sup^e^^ ||H(")[/3 (x) — /3(x)]|| 0 where /3(x) = 

(xa), ..., ga^ {xa)/q\Y and g^^ = g'^. Then, we have that for v = 1,... ,q 


where 


^q,a\ 


Xr, = 


a 


u,q,a\‘^oi 


(Xa) = 


- gii'Hxa)] N (bil'XiXa),(Tl^q^aiXa)^ 
(f , dx. ^ 


29+3 

u\f3— 


[W{e)Y 


-1 


,(«) 


« I ^u+l 






(a) 

with Sg and given in Theorem 4.1. 

It is worth noting that is such that its j—th component, 1 < j < q + l, equals 0 when q + j 

is odd since is an even function. Hence if g + + 1 is odd, or equivalently, when q — u is even, 

the bias will be 0. Hence, the bias term in the estimation of ga'^ appears only when q — v is odd. 


5 Monte Carlo Study 

This section contains the results of a simulation study conducted with the aim of comparing the 
performance of estimator defined in Section 2 with that of its classical counterpart introduced 
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in Severance-Lossin and Sperlich (1999), which corresponds to the choice p{u) = v?. We have 
performed N = 500 replications taking samples of size n = 500 when the dimension of the covariates 
is d = 2 and d = 4. We considered samples without outliers and also samples contaminated in 
different ways. For d = 2, we also included in our experiment cases where the response variable 
may be missing. All computations were carried out using an R implementation of our algorithm, 
which can be provided upon request. 

To generate missing responses, we first generated observations (Xl, satisfying the additive 
model Y = go(X) + u = po + + u, where u = a^e. Then, we generate 

independent Bernoulli random variables such that P (d* = l|li, Xj) = P (d* = l|Xj) = p (Xj). When 
d = 2 we used two different missing probabilities: p(x) = 1, which corresponds to the case where 
all the responses are observed, and p(x) = P 2 (x) = 0.4 + 0.5(cos(a:i + 0.2))^, which yields around 
31.5% of missing responses. For d = 4, we only report the results for p(x) = 1. 

In all cases, we considered polynomials of order q = 1. The smoothers were computed using 
the Epanechnikov kernel Ki{u) = K 2 {u) = 0.75(1 — (rt) when d = 2, while for d = 4 

we choose as the Epanechnikov kernel and L the fourth order kernel L{u) = (15/32)(1 — 
ri^)(3 —7ri^)I[_x i](rt) when estimating pa. We compared the classical marginal integration estimator 
denoted gc with the robust marginal integration estimator, denoted using the Huber’s loss 
function with tuning constant c = 1.345. To identify the marginal component estimators, we added 
a subscript indicating the additive component label. 

The performance of each estimator gj of gj, 1 < j < d, was measured through the following 
approximated integrated squared error (ise): 

1 "■ 

iSE(5rj) = -=^ — {gj {Xij) — gj (Ajj)) 5i , 

Z.i=i 

where Xij is the jth. component of Xj and dj = 0 if the z-th response was missing and dj = 1 
otherwise. A similar measure was used to compare the estimators of the regression function g = 
^^ + Tfj=l9j■ 

5.1 Monte Carlo study with d = 2 additive components 

In this case, the covariates were generated from a uniform distribution on the unit square, Xj = 
(Aj^i,Aj^ 2 )^ ~ C([0,1]^), the error scale was cjo = 0.5 and the overall location // = 0. We choose 
as measure in the integration procedure Q = 17([0,1]^) and the integral in (5) was approximated 
as the mean over 500 points generated according to Q. 

The additive components were chosen to be 

gi{xi) = 24(xi - 0.5)^ - 2, 52 ( 3 ^ 2 ) = 27rsin(7rx2) - 4 . 

We have fixed both bandwidths hi and /12 in 0.1. These are values close to the optimal ones with 
respect to the integrated mean square error for the bandwidth ha = given in N3 (see 

Severance-Lossin and Sperlich, 1999). 

For the errors, we considered the following settings: 

• Cq-. Ui ~ iV(0,ag). 
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• Cl'. Ui 


(1 - 0.15) iV(0,f7g) + 0.15iV(15,0.01). 


• C 2 - Ui ~ A^(10,0.01) for all i’s such that Xj S 2?o.09) where V^j is as above. 

• C 3 : Uj ~ (1 — 0.30) A^(0,(7 q) + 0.30 X(15,0.01) for alH’s such that XjG2?o.3- 


Case Cq corresponds to samples without outliers and they will illustrate the loss of efficiency 
incurred by using a robust estimator when it may not be needed. The contamination setting Ci 
corresponds to a gross-error model where all observations have the same chance of being contami¬ 
nated. On the other hand, case C 2 is pathological in the sense that all observations with covariates 
in the square [0.2, 0.29] x [0.2,0.29] are severely affected. Note that we choose an area where the 
interval length is smaller than the bandwidth, otherwise, the initial estimator will be severely af¬ 
fected. Finally, case Cs is a gross-error model with a higher probability of observing an outlier, but 
these are restricted to the square [0.2,05] x [0.2,0.5]. 

To summarize the values of iSE(jjj) and ise( 5 ) over replications, we report an approximation 
of the mean integrated squared error, denoted mise, which is obtained by averaging de ISE over 
all replications, and a more robust measure, denoted medise, that corresponds to the median 
over replications of the ISE. The obtained results are given in Table 1, for the different errors 
distributions as well as for data sets with and without missing responses. 





p(x) 

= 1 




P 2 (x) = 

= 0.4 + 0.5 cos^(xi 

+ 0.2) 



9C 

5l.C 

?2.C 


?1,R 

?2,R 

?C 

?1.C 

?2.C 

?R 

?1,R 

fl2,R 


MISE 

Co 

0.0188 

0.0216 

0.0174 

0.0172 

0.0200 

0.0183 

0.2506 

0.1278 

0.1451 

0.2540 

0.1270 

0.1500 

Cl 

6.4543 

0.7739 

0.5208 

0.9348 

0.4706 

0.2517 

16.3902 

6.2902 

4.8353 

8.5197 

4.5263 

3.3571 

C 2 

0.1005 

0.0590 

0.0532 

0.0557 

0.0374 

0.0363 

0.3353 

0.1661 

0.1823 

0.2987 

0.1470 

0.1708 

C 3 

0.8652 

0.3662 

0.3472 

0.1557 

0.0811 

0.0792 

1.1434 

0.4921 

0.4928 

0.4734 

0.2252 

0.2443 


MEDISE 

Co 

0.0103 

0.0113 

0.0111 

0.0108 

0.0118 

0.0115 

0.0286 

0.0220 

0.0220 

0.0307 

0.0232 

0.0234 

Cl 

6.0024 

0.4251 

0.4179 

0.5153 

0.1474 

0.1335 

6.9604 

0.8206 

0.7700 

1.8799 

0.5898 

0.5317 

C 2 

0.0850 

0.0477 

0.0481 

0.0300 

0.0234 

0.0264 

0.1174 

0.0641 

0.0640 

0.0708 

0.0415 

0.0454 

C 3 

0.8030 

0.3456 

0.3249 

0.0483 

0.0338 

0.0363 

0.8886 

0.3728 

0.3476 

0.1520 

0.0798 

0.0702 


Table 1: mise and medise of the estimators of the regression functions g, gi and g 2 under different 
contaminations, for the complete data and for sets with missing responses. 


As expected, when the data do not contain outliers or missing responses, the robust estimators 
shows larger medise values than the classical estimators based on the square loss function. In a few 
cases, the mise values of the robust estimators are slightly smaller than those of the classical ones. 
However, all these differences are well within the Monte Carlo margin of error. For contaminated 
errors, the behaviour of the classical and robust estimators are quite different. The contamination 
setting Cl is the worst for the estimators dehned in Severance-Lossin and Sperlich (1999), since 
a 15% of the observations are contaminated with a large residual. Effectively, under Ci, the mise 
of the classical estimator of the regression function g is more than 6 times larger than those of its 
robust counterpart, while the medise is 10 times larger. This difference is smaller when estimating 
the additive components, but is still important. On the other hand, C 2 seems to affect less the 
classical estimator. Indeed, under C 2 the mise and medise of 'gc are twice those of while for 
each additive component, the mise and medise of the classical estimators are a 50% larger than 
those of the robust ones. Finally, contamination Cs seems to be more harmful than C 2 - Effectively, 
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the reported medise values for the classical regression estimator 'qq are more than 15 times larger 
than those of the robust estimator while when estimating each additive component the classical 
estimators medise is 10 times larger than those obtained with its robust counterpart. It is worth 
noting that the ratio between the classical and robust estimators mise is smaller than when using 
the medise, although large values are still obtained. This fact may be explained by the presence 
of a few samples where the estimators, specially the robust estimator, perform differently from the 
majority of the samples. 

When missing responses arise, as one would expect, all estimators have larger mise and medise 
values than when p = 1 due to the loss of about 31.5% of responses. Beyond this fact, simi¬ 
lar conclusions can be drawn regarding the advantage of the robust procedure over the classical 
estimators. 

5.2 Monte Carlo study with d = 4 additive components 

For this model we generated covariates X* = {Xu, Xi 2 , Xi^, Xu) ~ t/([—3,3]^), independent errors 
£i ~ Ai(0,1) and (To = 0.15. Similarly to what we have set for d = 2, we chose as measure in 
the integration procedure Q = [/([—3,3]“^) and, as in Section 5.1, the integral in (5) was also 
approximated as the mean over 500 points generated according to Q. 

The additive components chosen are related to those in the numerical study in Severance-Lossin 
and Sperlich (1999) and correspond to 

9o,i{xi) = 9 o,2{x2) = sin(-X 2 ), 

90, 3 (^ 3 ) = - 1-5, 9o,i{x4) = - e"^). 

In this numerical experiment, the bandwidths were selected using a ii'—fold cross-validation 
procedure as follows. As usual, we first randomly partition the data set into K disjoint subsets of 
approximately equal sizes Gk^ ^ ^ k < K, so that U^i Gk = For each fixed (/i, h), let 

h = {h,h). Note that when estimating the a—th additive component, the bandwidth used for the 
a—th component is h, while on the nuisance directions we use h. Moreover, the kernels are also 
modified depending on the component to be estimated. More precisely, when estimating ga, for 

a, Ki = L the fourth order kernel described above, while ATq, is the Epanechnikov kernel. 

Denote as (x) and (x) the classical and robust marginal integration estimators com¬ 
puted with the bandwidths h and h, without using the observations with indices in Gk- The classical 
iF-fold cross-validation criterion given by 

1 ^ 

L^sih, = > 

k=lieOk 

is minimized over a set "H X "H of possible bandwidths {h, h). 

On the other hand, as is well known, a robust cross-validation criterion needs to be considered 
when using robust estimators. The robust iF—fold cross-validation method used in this numerical 
study is related to the procedure dehned in Boente et al. (2010) and minimizes over TL the 
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robust criterion 


Ln{K ^) = E I (X*)}) 

The number of folds K was set equal to if = 5. Due to the computational complexity involved, 
we only considered the contamination schemes Cq and Ci defined in Section 5.1. 

To obtain bandwidths satisfying (9) with q = I, the set H x H of possible values for {h, h) was 
chosen satisfying h = C and h = C n~'^ with r = 0 . 12 . The constant C took initially five 

possible values leading to % = {1,1.5, 2,2.5, 3}. When the minimum, was attained at h = 3, the 
grid was enlarged to include values of /i G {3.5,4,4.5,5, 5.5}. Note that when /i = 1, C ~ 2.11 and 
h = C so we expect in average 3 observations in each 4—dimensional neighbourhood. For that 

reason, to obtain a reliable estimate of the residual scale (Tq, independently of the choice of {h, h),a 
preliminary regression estimator was computed using as bandwidth hg- = (0.93,0.93,0.93,0.93). 
With these bandwidths, we expect an average of 5 points in each 4-dimensional neighbourhood. It 
is also worth noting that the optimal bandwidth h to estimate Qa in this model lead to very small 
values and were not taken as possible values of the grid. 

In this numerical study, the ISE of few samples was very different to most of the data sets, 
probably due to the fact that the bandwidth search was not exhaustive. Hence, to provide summary 
values for iSE( 5 j) and ise( 5 ) over replications, we report the median over replications as well as 
the trimmed mean over replications of the ISE with 1% and 5% trimming. Note that the MEDISE 
corresponds to a 50% trimming. The results obtained under Cq and Ci are given in Table 2. 



V 


9c 

?1,C 

?2,C 

?3,C 

54 ,c 

?R 

?1,R 

?2,R 

?3,R 

?4,R 

1% 

Go 

1.0969 

0.2076 

0.3019 

0.1358 

0.1824 

1.4550 

0.0962 

0.1360 

0.2527 

0.0638 


Cl 

10.2064 

1.0854 

1.1212 

0.6874 

0.5816 

0.3268 

0.0945 

0.1080 

0.1098 

0.0653 

5% 

Go 

0.3589 

0.0788 

0.1029 

0.0916 

0.0547 

0.3612 

0.0462 

0.0498 

0.0464 

0.0370 


Gi 

5.2254 

0.2199 

0.2324 

0.2594 

0.1994 

0.3210 

0.0929 

0.1060 

0.1087 

0.0639 

50% 

Go 

0.1536 

0.0577 

0.0674 

0.0808 

0.0371 

0.1526 

0.0391 

0.0415 

0.0303 

0.0277 


Gi 

5.2118 

0.1875 

0.2033 

0.2202 

0.1738 

0.3109 

0.0926 

0.1040 

0.1072 

0.0621 


Table 2: Trimmed mean of the ISE for the estimators of the regression functions g and 1 < j < 4 
under different contaminations. The trimming values u considered equal 1%, 5% and 50%. 


The numerical experiment for d = 4 yields similar conclusions regarding the advantage of the 
robust procedure over the classical one than in dimension d = 2. As expected, the robust marginal 
integration estimator is less efficient than the classical estimator for clean data. Under Ci, the ISE 
trimmed means of gc are more than 15 times larger than those obtained with g^. Besides, when 
considering a 1 % trimming, the classical estimators of gi, g 2 , gs and 5^4 gives trimmed mean values 
more than 11, 10, 6 and 8 times larger than those corresponding to the robust estimator. When 
considering the 5% trimmed mean and the medise, the difference is not so noticeable as with a 
1 % trimming but it is still large, since in all cases the summary measure of classical estimator is 
at least the double of that corresponding to the robust estimator. 
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A Appendix 

We begin by fixing some notation which will be useful in the sequel. Denote as 


Ra{Xi^a,Xa) = “ gaiXa) “ y^,9a\xa) (A.l) 




and ii(Xj,x) = - gjixj)} + Ra{Xi^a,Xa)- Furthermore, define 


Xj,a = 1, 


^2,0 (-^2,0: (-^2,a 




hi 


hi 


— (^2,1,a; ■ ■ ■ ?^2,g+l,Q:) 


Then, y, - x,,^H(")/3( x) = Cj + i?(Xj,x)/c7. Let C/j = a{Xi)£i so that Yi = g(Xi) + [/*. Denote 
Vi = cr(Xi)ej/cr(x) = Ui/a{x.) and for r = (/3o, Kl3i, h‘ij32, • • •, ha/3qY define 


4 (r) = -'^Si 




2=1 


-y (Jip 

n 


2=1 

n 


Y — r^x- 
-^2 ^ -^2,1 


s(x) 

(Xj - x), 


^H,(X,-X) 


s(x) 


(A.2) 


2=1 


cj(x) 

1 " 

- y ^Hd(Xi - x)^ji/> (l/j a) Xj,„ . 


a Xi, 


(A.3) 


2=1 


Given a compact set C C we denote as Np{C) the minimum number of balls of radius p needed 

to cover C. Then, we have that C G \Jk=P ^di'^k-.p) where Bd{-^k, p) = {y G ; ||y - Xfc|| < p} 
stands for the ball of center x^ and radius p. It is well known that Np{C) < A\/p'^ where the 
constant Ai does not depend on p. We also denote as Bd^p = Bd{0^ p) the ball centered at 0 and as 
Vd,p = {y G : ||y|| = p} the sphere of center 0 and radius p. 


A.l Proof of Proposition 3.1 


We begin by proving some Lemmas that will be helpful in the sequel. 

The following Lemma corresponds to the well known exponential inequality for bounded vari¬ 
ables and can be seen, for instance, in Pollard (1984) or Ferraty and Vieu (2006). Lemma A.1.1 is 
needed to derive Lemma A. 1.2 which is a previous step to prove Lemma A. 1.3. 

Lemma A.1.1. Let {Zj}j>i be independent random variables such that EZj = 0, \Zi\ < M and 
0-2 = ]e _^2 ^ Then, for all e > 0 we have that 


P 


E^‘ 


> en 


< 2 exp 


e^n I 
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Lemma A.1.2. Let C C be a compact set with non-empty interior and = [1 — 5,1 + 5] where 
6 < 1/2. Let Wi = Wi{a) = f{Yi,X.i,di,a) be a sequence of random variables such that \Wi\ < M, 
for all i and \Wi{ai) — Wi{a 2 )\ < Mi\ai — a 2 \- Define 5'„(x, a) = (1/n) ^11=1 a) — EGj(x,a)) 

where Gi(x,a) = /Cn^lXi — x)Wi(a)xtj ^ where m, m = 0,1 and 1 < j,i < q + 1 are fixed. 
Assume also that A2, A6, A5 hold and denote Ah = l/mini<j<d/ij. 


a) Let On d-nd p = pn be non-negative numerical sequences converging to zero, such that 

9~^Ahp < M 2 , for all n > 1 and pn |nj=i^i| 0- Then, there exist bi > 0 and 

62 > 0 and a constant Cq > 0 such that for all C > Cq and for all n> no. 


P sup sup 

y xGf?(xj.,p)nC 


Sk,s 



< 4 exp 


C^OlnUU^j \ 

biAlp^ + b2C0nAhpj 


where 5fc,^(x, a) = 5„(x,a) - Sn{xk,as), C C \Jhlf^^3d{xk,p) and = [1 - 5,1 + 5] C 

2 ^ ^ as + p]. 


b) Let On and p = pn be non-negative numerical sequences converging to zero, such that 
0~^AhP < M 2 and pn |nj^=i^i| 0- Then, there exist > 0, 1 < j < 4 and a 

constant Cq > 0 such that for any C > Cq and for all n> no, 


P I sup |S'n(x, a)| > COr, 

xec 
KaeXs 





c^elnuW'i ] 

ibjAlp^ +2b.2Ce„A,,pj 


c) Let On = .^log n/{n Wp=ihj). Then, there exists C such that 


y~]P ( 0„^sup sup |S'„(x,a)| > C j < 

xGC aC^Xs / 


00 


n>l 


that is, supxgcsuPagi^ |S’n(x,a)| = Oa.co. {On)- 


Proof, a) For a fixed 1 < A: < Ap(C), let 




- Xk ,Q! {X^,a - - Xk ,a )' {Xi,a - - Xk ,q)^ ^ / W W yT 

-•-) j---; I — \Xk,i,l, ■ ■ ■ ,Xk,i,q+l) , 


where we avoid the subscript a to simplify the notation. For 1 < j, ^ < d + 1 define 1C^^^\\T) = 
Then, we have that /CH^(Xj - = ld^^{Xi - x) and /CHd(Xj - 

'^k)x'^ i jX"^ ii = “ Xfc)- Hence, using that the kernels Kj have compact support in [—1,1] 


we obtain that 


|5'fc,s(x,a)| < - 


n 


'^{Gi{x,a) - Gi{xk,as)) 


2=1 


1 "" 

+ - ^ |EGi(xfc,as) - EGi(x,a)| 
n 


2=1 


< 5fc(x, a) + 5fc,s(xfc,a) 
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where 


5fc(x,a) = 


n 


^(Gi(x,a) - Gi(xfc,a)) 


1=1 


+ -'^|EGi(xfc,o) -EGi(x,a)| 

n 


2=1 


< 


1\ T ^ 

" E {|'=H'’(Xi - x) - K«'>(X. - xt) 


2=1 


+ |E/ci^'j(X, - x) - E/cgJ(X, - xfc)|} lB(,,h)uB(x„h)(XO 


Sk,s(p^k^ — 


n 


^(Gi(xfc, o) - Gi(xfc, o^)) 


2=1 


+ |EGi(xfc,Os) -EGi(xfc,a)| 

n 


2=1 


with h = (hi,..., hd)^ and -B(x, h) = {y S : \yj — Xj\ < hj for 1 < j < d}. Using that the 
kernels Kj are Lipschitz of order one and that if /C(X,j — x^) 7 ^ 0, then ^ 1) we get easily 

that for any x G Bd{xk, p) 



-x)-/cg^(X, 





which leads to 


sup sup |S’fc(x,a)| < 6 »„^C 2 —VlB^(x^,h+p)(Xi) = 

xez?d(xfc,p)nc aGX.nXi | i^.^i hj n 

where C 2 = 2Mci. 

On the other hand, since p/n^=i 0) we get that there exists ni G N such that for all 

n > ni, nUh j + p) < W^j=i 2 0^=1 hj. Observe that, since n is large enough, we may 

assume that hj < 1 for all j, so A^p < p/nj=i 0. 

Let Zj = Iu_^(xj^_h+p)(Xi) Ah/nj=i Then, using that EIu^(xj^_h+p)(Xi) < C4U^j^iihj + p) 
where C 4 = ||/x||oo) we obtain that for n > ni. 


Zi\ < 


AhP 

nti 


EZj < C 4 W{hj + p) 
j=i 


AhP 

nti h, 


^ ‘2c^^}iP —y 0 . 


Therefore, |Zj — EZ^I < Ahp/W^j^i 


VAR(Zj) < EZf < C 4 n {hj + p) 


i=i 


AhP 

.11^=1 hd. 


nU ’'i 


Then, we have that 


AhP 

n?=i 


'1 n \ 1 " 1 

- IZ^ehx.,h+P)(X*) ]=-'^Zi<- 


2=1 


^(Zi-EZ,) 


2 = 1 


+ 2c4.AhP ■ (A.4) 
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On the other hand, the fact that < M 2 , for any C > Cq = 4 c 2 C 4 M 2 lead us to 


9n ^ sup 

5fc(x) 

>c]<p(i 

n 

- EZi) 

y xGB(xfc,p) 


/ V" 

2=1 


> 


COn 

2C2 


Finally, Lemma A.1.1 for all n > ni implies that, for all n > ni, if C > Cq we have that there exist 
61 and 62 


sup |5fc(x,a)| > 

nx.5 


c 


< P ( 9-^Ak,n > 


c 


On the other hand, using that |Wj(ai) — Wj(a 2 )| < Mi|ai 
get that 


(A.5) 

02 ! and the fact that |a — a^l < P) we 


|Gi(xfc,a) -Gi(xfc,as)l < Mipld^\^i - Xfc) < 


Ml p 

nti 


■lBd(xfe,h)(Xi) < 


Ml p 


n 


j=i 


llSd(xfc,h+p)(X*) 


Then, if n > n 2 we have that 


5'fc,s(x/j,a) < 2 Mip^ 'y ^ lli?j(x;;,h+p) (^») — Ak^ri 


2 = 1 


since A^ —)■ 00 . Therefore, if n > max{ni,n 2 } 

/ 


6n^ sup |5'fc,£(x,a)| > ^ I < P ( > ^ ) < 2 exp <( - 


xGS(xn ,p)nC 

V aex'nx, 


2 I - y n 2 

which together with (A.5) concludes the proof of a). 


C^^Oln n^=i hj 


biAlp^ + b2C9nAhP \ ’ 


b) Recall that C C ^di^k^p) with Np{C) < Ai/p'^ and = [1 — <5,1 + 5] C with 

Npi^s) < 2/p. Then, given x S C and a £ Is there exist k,s such that x S B{xk,p), a G Ig. 

Besides, for any x G B{xk,p), we have that |5n(x, o)l < |5n(xfc,as)| + 5fc^s(x, o) 


so 


sup sup 1 5,1 (x, o)I < max |5,i(xfc, a^)! + max sup 5fc^s(x,a) 

x€C ^<^<Np{C) '^<k<Np{C) xG5(xn,p)nC 

\<s<Np{X5) ^<s<Np{X5) a^XsOXs 

which entails that P sup^gc sup^gx^ |5„(x, a)| > O') < /?n + 7n where 


Pn = 


'In — 


C9r. 


( 


max | 5 „(xfc,a 5 )| > 
\<k<Np(C) 2 

\\<s<Np{Xs) 


\ 


< Np{C) Np{Is) sup P |5„(x, a)| > 


xGC 

aGXs 


C9r. 


( 


max sup 

i<fc<iVpCC) xe8(xn,p)nc 

\\<s<Np{Xs) a&XsnXs 


Sk,s(p^: ®) 


> 


C9r. 
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Using Lemma A.1.1, straightforward calculations (see Martinez (2014) for details) allow to show 
that there exists 63 , 64 > 0 such that 


f C^O^n nf-i 1 Ai f C^Onn Oi-i 

Pn < 2Np{C)Np{Is) exp <1 - \ < 2^ exp <| - ” 


463 + 264(70^ j p' 
Using a), it follows that for all C > 2 Co > 0 and all n > no 


463 + 2b^C9rt 


(A.6) 


sup 


> ? 



^ / 

\ a&IsnXs 


/ 


^1 f C^ln 11^=1 hj 
<4^exp<'- 


AbiAlp"^ + 262 COnAhp 


(A.7) 

The bound given in b) follows now from (A. 6 ) and (A.7). 

c) Observe that, A6 implies that 9n 0. Define p = logn/n. We will show that the conditions 
in b) are fulfilled. It is clear that p —^ 0 and besides by A6, pn |nj=i 0- the 

other hand. 


[9^^AhpY = 


1 \ogn \ ^ logn ^ ^ 


n \j log re 

so 9~^AfiP < 1 if re is large enough. Noticing that 0^ref([j^^ hj = log(re) and that there exists Ai 
such that Np{C) < Aip~‘^, we have that b) implies that 

( 


n nj=i hj 


44i 

9~^ sup |S'n(x,a)| > C I < —^ 

x6B{xj,,p)nC I P 

\ a&XsPds 

< llii 

— pd 


[ C^GnnYfj=ihj\ [ 

exp < ——- — > + exp < — 


exp 


463 + 2b/^C9n 

C^ log(re) 

463 + 264 C 0 J] 


+ exp - 


46iA^p2 262 C9nAhp 

C^ log (re) 

AbiA\p^ + 262 C 9 nAhp 


Finally, using that 9n ^ 0 and A^p —)• 0, we obtain that there exists rei such that for all re > rei, 
2biC9n < 463 and 46iA|p^ + 262 CPnA^p < 863 , then 


P 9. 


-1 


sup sup |S'n(x, a)| >C) < 8 A 1 P '^exp<v — 


x€B{xfe,p)nC aGlsHls 


C^ log(re) 
863 


< 8 A 1 


log(re) 


d- 


d-i 


re ®*’3 < 84Ai re ®*'3 


Therefore, for any C > Ci = max{Co, y/Sb^d + 3}, we get that X]„>i P {9^ ^ sup^gc l^n (x))| >C) < 
00 concluding the proof. □ 


Remark A.1.1. Taking m = m = 0 and Wt = 1, Lemma A.1.2c) entails the uniform convergence 
for the kernel density estimator, that is, we obtain that 


1 

sup — 
xec n 


^(/Ch,(X,-x)-E/Ch,(X,-x)) 


2=1 


— Oa.co. (^n) • 
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Besides, using that sup^gc l/x(x)| < oo, /C has compact support and /x is uniformly continuous 
in C, we get that 


sup 

xec 


1 C 

- ^E/CHrf(Xi - x) -/x(x) = sup //C(u) (/x(HrfU + x 
n xGC J 


- /x(x)) 


which together with the above result implies that 


sup 

xGC 


-^/Ch,(X,-x)-/x(x) 


i=l 


— Oa.s. (1) 


(A.8) 


Lemma A.1.3. Assume that Al, A2 and A4 to A7 hold, then 

sup I Jnj(x,ao-(x))| =Oa.sX0n) 
xGC 


with On = y log n/(n 0^=1 where j = I,... ,d + I and Jnj is the jth component of vector J„ 
and Oo-(x) = (t(x)/s(x). 


Proof. By (8), taking 5 = 1/2, we get that there exists Af such that ¥{Af) = 0 and for any u ^ Af, 
there exists ni G N such that for all n > ni sup^gc | 0 (t(x) — 1| <6. Therefore, for any u ^ Af and 
n > TT-i, we have that sup^gc I(x, Oo-(x))| < sup^gc sup„g [;^_5 |J„j(x, a)|, so to conclude the 
proof it is enough to see that supxgc supag[i _5 |J„j(x, a)| = Oa.s.(^n)- Recall that 


Jn(x, o) 


1 


cj(Xi)ei \ 
cj(x) 7 


Xv ry — 


1 

n 


n 

5i/CHd(Xj - x)i/> {Vi a) Xj,« • 

i=l 


then, Jnj{:x-, a) = (1/n) YXi=i ~ ^)V’ (Yi ®) the proof follows now from Lemma 

A.1.2c) taking m = 1, m = 0 and Wi{a) = Slip (Via) and noting that |lTj(a)| < ll'0l|oo/^(i) and 
|lT,(ai) - Wi{a 2 )\ < (||Cl|oo/i(i)) !«! - a2|. □ 


Proof of Proposition 3.1. Let r = (/3o, ha/3i,/ 1 q/ 32 , • • • ,hffj3qY, 7(r) be defined in (A.2) and 
ro(x) = (^fl'(x), haga\xa), ■■■, hfg^^Xa)^ . For the sake of simplicity denote Vr = Vq+i,T = {r : 
||r|| = r}. To prove Proposition 3.1, we will first show that it is enough to see that there exists Af 
such that P(AA) = 0 and such that for all uj ^ Af, given u > 0 there exists 0 < Tj/ < 1 small enough 
such that for any 0 < r < and n > uq, 

inf inf {4(r + ro(x)) - 7(ro(x))} > 0 . (A.9) 

rGVr xGC 

Indeed, in the set {infrgVr iafxGC [^n(i'+ ro(x)) — 7(ro(x))] > 0} for all x G C we have that 
infreVr ^n(r + ro(x)) > 7 (ro(x)), which implies that the function Ln{r) = 7 (r + ro(x)) — 7 (i'o(x)) 

^ o o 

has a local minimum r(x) in where TZ stands for the interior of the set TZ. Then, for all 

X G C, r(x) + ro(x) is a local minimum of l'n(r) and r(x) + ro(x) belongs to i3g+i(ro(x), r) = {r : 
||r — ro(x)|| < r}, as a result of which /3(x) = r(x) + ro(x) is a solution of (6). That is, with 
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probability 1, for all x G C, there exists a solution /3(x) of ( 6 ) in the interior of ;Sq_|_i(ro(x), r). 
Hence, for any ui ^ Af, given u > 0, and r > 0 small enough sup^gc l|H^“^[/3(x) — /3(x)]|| < r, 
n > no, which implies that sup^gc ||H(")[/3 (x) — /3(x)]|| 0 as desired. 

In order to prove (A.9), observe that — (r + ro(x))"^Xj_Q = Ui + ^(Xj) — — ro^j,a = 

Ui + i?(Xj,x)xi,Q, - r'^Xi^Q,. Denote as Zi(x) = {Ui + i2(Xi, x))/s(x) = Vi{x) + i?(Xj, x)s(x)~^ 
with Vi(x) = Ui/'s{x) = (j(Xj)ej/s(x) and Aj(x) = r'^Xj_Q/s(x). Then, using that p{b) — p{a) = 
'ilj{u)du we obtain that for all r G Vr 


4(r+ro(x))-4(ro(x)) = - ^/CHd(Xi-x)^i / ^ {t) dt = A:„i(x)+A:„ 2 (x)+Ar„ 3 (x), (A.IO) 

^ i=i 

with 

1 n /‘Zifx'l—A,fx) 1 1 ^ 

ilj{Vi)dt = - r'^- djlCuj ,(Xt - x)i/>(I/i(x))xj,„ , 
s(xj n 


A'„i(x) 

X„ 2 (x) 


n 


1 " .Z.(x)-A,(x) 

^5i^Hd(Xj-x) / 

J Zi{-x.) 

1 /‘■?i(x) —Ai(x) 

- x) ‘ ^'(Hi(x))(t - H,(x))dt 


Z =1 


n 


2=1 
1 1 


'ZiU) 


2''2(^) -x)'ip'{Vi{x)) (r^Xi,o) - 2i?(Xi,x)r^Xi, 


A'n3(x) = - ^ (5i/CHd(Xj - x) 


2=1 


rZi{x)-Ai{x) 

'ZiU) 


ijj {t) - 'ilj{Vi{x)) - 'ilj'{Vi{x)){t - Vi{x)) 


dt 


The proof of (A.9) will be done in several steps. Let us assume that the following approximations 
hold 


sup sup ||A:„i(x)|| 

rSVr xeC 


Kn2{^) 


sup sup ||/Cn 3 (x)|| 
rSVr xSC 


logn - 

jd , bn ) 


V n 11^=1 hj 

^^^^E(V^'(e))/x(x)p(x) rTS(")r(l + Cn) + r {hl^^ + '^hA T2,n, 


3i^oL 


T T 


+ /,5+l+^/,. ) t3,„, 

j¥=a 


(A.ll) 

(A. 12 ) 

(A.13) 


with = Oa.co.(l), ^ 3 ,n = Oa.co.(l), ^i,n and T 3 ,„ not depending on r and where Cn = Oa.co.(l) 
and T 2 ,n = Oa.co. (1) do not depend on x G C nor r and, consequently, neither on r. 

We begin by showing that (A.ll) to (A.13) imply (A.9). 

Let us denote as i^i > 0 the minimum eigenvalue of 8 ^“^ which is a symmetric and positive 
dehnite matrix. Using that Ei/''(e) > 0, i(/x) > 0, i{p) > 0 and that the scale function a is 
bounded over x G C, if M = z^iEV''(e)i(/x)*(p)/(2 sup^gc c^(x)), we obtain that x G C and r G Vr 


Q(r,x) = ^-^E(V’'(e))/x(x)p(x)r'^S(“)r > ^-^E(V^'(e))/x(x) p(x) > MA > 0 . 
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As T 2 ,n = Oa.co. (1) and C,n = Oa.co.(l)) given > 0 there exists Ai such that 


^p(|T 2,„| > Ai) <oo 

n>\ 


(iCnl > < OO. 

n '> 1 ^ ' 


Let Ur be such that + Ylj^a — 'rniin{M/(4^i), 1}, for n > Ur- Hence, there exists a set 
A/i satisfying that P(A/i) = 0 and for all u ^ Mi, there exists ni £ N such that for all n > ni, we 
have that |T 2 ,n| < M and < i. 

Since iL„ 2 (x) > Q(r,x)(l - |Cn|) - t + 1^2,n|, if w ^ A/i and n > ni^r = 

max(n 7 -,ni) we have that 

inf inf iLn 2 (x) > . (A. 14) 

reV^xGC 4 

From (A.11) and the fact that nW^-^^hj / log n —)• oo,we get easily that there exist a positive 
constant A 2 and a set M 2 such that P(A/ 2 ) = 0 and for all w ^ M 2 there exists 712 ,r such that 


T('i,n| < A2 


and 


log n 


nUU 


1 


< T min 



for n > 77.2,r- Thus, using (A.14), we obtain that if w ^ Mi UM2 and n > max(?T.i,7-, 772,,-) 


inf inf (iL„i(x) + A„2(x)) > —M . 
-GVr XgC 8 


(A.15) 


On the other hand, Kns satisfies (A.13) with = Oa.co.(1); then there exist a positive A 3 and 
a set M 3 such that P(A/ 3 ) = 0 and for all u ^ M 3 there exists 773 such that |T 3,„1 < A 3 , for 77 > 773 . 
Besides, there exists 773 ,,- £ N such that ^ — ^ 3 ,t- Therefore, we obtain 

that for all oj ^ M 3 and for any n > 774 ,,- = max{773,r, 773 }, sup^gv^ ®'^PxgC |Ain 3 (x)| < A 3 T^. Taking 
Ty < min{l, M/(16 A 3 )}, we get that for t < Ty, uj ^ M 3 and n > 

NI 

sup sup |A'„ 3 (x)| < —M . (AA6) 

reVr XGC lo 

Therefore, we have shown that for all 0 < r < < 1, w ^ M 3 and n > max(77i,T-, 772,r, ?t- 4 ,t); the 

assertions (A. 16) and (A.15) hold which together with (A. 10) lead us to (A.9). 

It remains to show (A.11), (A.12) and (A.13). 

• Let us begin by proving (A.11). Note that 

Ani(x) = -;^r'^J„,o(x,a<j(x)) , 
s(x) 

where J„,c(x, a) is defined in (A.3). Note that (7) implies that 

P [ 3770 tal que V 77 > no sup sup || A„i(x)|| < — sup || J„ q(x, ao-(x))|| ) = 1 • 

V rGVr XGC A XgC ’ J 

Lemma A.1.3 entails that supx-gc II Jn,o(x, ao-(x))|| = Oa.s. (^n), which concludes the proof of (A.11). 
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• Let us show that (A.13) holds. Recall that Zj(x) = V)(x)+i?(Xj, x)s(x) ^ andAj(x) = r'^Xj^Q/s(x). 
By the integral mean value theorem, 


/■R(x)+J?(Xi,x)S'(x) ^-Ai(x) 


Knsi^) = ~ X] - x) 


i=l 


/R(x)+R(Xi,x)?(x)-i 


V’ (t) - V’(Bi(x)) - V’'(Ri(x))(t - Ri(x)) 


dt 


= —r 




SilCuaO^i - x) '0 (^^i(x) + 0i(x) j - V'(^i(x)) - '0'(Ri(x))0i(x) 


X,;., 


where 0j(x) is an intermediate point between R(Xj,x)/s(x) and {R(Xj,x) — r"^Xj^Q,}/s(x). By 
simplicity, if i G {i : ICn^CKi — x) = 0}, we define 0i(x) = 0 since it does not change the sum. Note 
that \Xij — Xj\ < hj for i = 1, • • • , n and j = 1, • • • ,d, when — x) 7 ^ 0, thus A4 implies 

that 

sup max |R(Xi,x)| < Ag ( + /i^+M , (A.17) 

xGC*:/CH^(Xi-x)^0 J 

where Ag is a constant only depending on Hfl'j^^lloo for j ^ a and ||cx)- Then, using (7), we 

obtain that 


P 


^3no such that Vn > no 


sup max |0i(x)| < Ai 
xec ^<i<n 


r + 1C' + Y.I‘l 


j¥=a 


= 1 . 


Let /C*(u) = |/C(u)|/J’|/C(u)|(iu, then. Remark A. 1.1 implies that the density estimator based on 
/(x) = (l/n)X;”^i/C^j^ (x-X j) converges uniformly and almost surely to /x (see (A. 8 )). 

Hence, using A2, we obtain that supx;gc/(x) = Oa.s.(1) which together with the fact that is 
bounded and that each component of Xj^Q, is smaller or equal to 1 when /Ch^CX* — x) / 0, leads to 


sup sup||/Ca 3 (x)|| < }T sup max |0j(x)p sup - ^ |/CHd(Xi - x)| , 
reVrxGC Ai[t) xec xgc u ^ 


whenever A < s(x) for all x G C. Hence, from (7) we obtain that 


P 


^3no such that Vn > no 


sup sup ||/C„ 3 (x)|| < r 
reVr xGC 


IKIloo 

Ai{t) 


Ai 


+ + X] 



= 1 


where Tg^ = supx;gc(l/n) X]r=i — x)| = Oa.s.(l) does not depend on r, concluding the 

proof of (A.13). 

• Finally, to conclude the proof we will obtain (A. 12). We have that 


A„2(x) = 


1 1 


2 s 2 (x) n ^ 

1 

2 s 2 (x) 


^AHd(Xi - yL)5i'ip'{Vi{yL)) (r'^Xj^^)^ - 2R(Xj,x)r'^Xj^c 


r M„ir — 2 r M 


n2 


(A. 18 ) 
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For X G C and a G = [1 — 5,1 + 5] (with 0 < 5 < 1), define M(x, a) = EV''(ea)/x(x)p(x)S(“) and 


1 X ^ 

M„i(x,a) = - ^/CHd(Xj - x)5i'0'(Fi(x)a)xj,aXj^a • 

i=l 

Then, M(x, 1) = E'i/;'(e)/x(x)p(x)S*^“\ We want to show that 

sup||M„i(x)-E'0'(e)/x(x)p(x)S(“)|| = Oa.s.(l) 

x&C 


sup ||M„2 (x)|| 
xGC 


Oa 


+ Y1 


(A.19) 

(A.20) 


Indeed, if (A.19) and (A.20) hold, using (7) and that sup^gc |S'(x) — ct(x)| 0, a is bounded in 
C, i{(j) > 0 and replacing (A.19) and (A.20) in (A.18) we get that 


^n2(x) = ^^^^E(V''(e))/x(x)p(x) r'^S(")r(l + Cn) + r ( ^ hj 

V jAo 


T 


2,n ) 


where Cn = Oa.s. (1) and T2,n = Oa.s. (1) do not depend on r and therefore, neither on r, nor x G C 
since the convergences are uniform over x, which would conclude the proof of (A. 12). 

In order to prove (A.19) it is enough to show that for all 1 < j, /c < d + 1 


sup |M„ijfc(x) - M,fc(x, 1)1 = Oa.s.(l), (A.21) 

xec 


where M„ijfc(x), M„ijfc(x, a) and Mjfc(x, a) are the components (j, fe) of matrices M„i(x, a), 
M„i(x, a) and M(x, a), respectively. 

Note that Mnijki^) = Al„ijfc(x,do-(x)) where do-(x) = cj(x)/s(x). Hence, from the bounds 

sup I A4ijfc(x, a) - Mjk{x, a)| < sup |M„ijfc(x, a) - EM„ijfc(x, a)\ + sup |EM„ijfc(x, a) - Mjk{x, a)| , 
xSC xGC xGC 

sup|A4ijfc(x) - Mjk{x, 1)1 < sup|A4ijfc(x,dt,(x)) - A/,fc(x,do-(x))| + sup |Mjfc(x,do-(x)) - Mjfc(x, 1)| , 
xGC xGC xgC 

we obtain that in order to prove (A.21), it is enough to see that 

(i) supxec supagi^ \Mni,jk{x,a) - EM„ijfc(x, a)| = Oa.s. (^^J^ogn/{n 

(ii) supxgc supagi^ |EM„ijfc(x,a) - M,fc(x,a)| = o(l) 

(hi) supxgc l^nijfc(x,da(x)) - M,fc(x,a^(x))| = Oa.s.(l) 

(iv) supxgc |Mjfc(x,dc.(x)) - Mjk{x, 1)| = Oa.s.(l). 

(i) can be obtained immediately from Lemma A.1.2 taking m = fh = 1 and considering the 
sequence of independent random variables 14 (a) = ip' (fT(Xj)eia/(T(x)) 6 i and noting that | 14 (a)| < 
llt^ ||oo/^(^) for all a and that | 14 ^(a]^) lF 2 (a 2 )| ^ IIC 2 II 001 ^21 ■ 


24 




To show (ii), let Cq be the compact neighbourhood of C given in assumptions A2 and A3. Then, 
fx, P and a are uniformly continuous functions in Cq. Define 

7x,a(t) = E 1^1 = • 

Using that £i are independent from covariates, we obtain that 


l7x,a(t) - Ei/>'(eia)| < E 




= E|C2(eia( 


o-(t) 


cj(x) 


where 6 is an intermediate point between cr(t)/cj(x) and 1 , so that 


^ < max <; 1 , 


^(x) 

cr(t) 


Using that C 2 {u) = uip'^u) is bounded, we obtain the upper bound 


|7x,a(t) - Eij'{£ia)\ < IIC 2 I 


cj(t) 


cr(x) 


- 1 


max < 1 , 


O'(x) 

C7(t) 


= IIC 2 IU k(t) - f^(x)| max 


1 


1 

0 ’ 


— h_\ 

tr(x)’(r(t)/ 


Using that inftgCo '^(^) > ^ and sup^g^o '^(^) < have that there exists ci such that for all 

X G C and t G Cq 

| 7 x,a(t) - Ei/>'(eia)| < ci |cr(t) - a(x)| . (A. 22 ) 

Observe that 


EMnijfc(x,a) = E/CHrf(Xi - x)p(Xi)7x,a(Xi)xpQ,^xp«j^ 

= J- x)p(u)7x,a(u) 

Changing variables y = H^^(u — x), we obtain 

EM„i^, Jx, a) - a) = Jp{iidy + x) 7 x,a(Hrfy + x)/x(Hrfy + IC{y) dy 

-EV>'(eia)/x(x)p(x)5jfc^ , 

where = f yi'^^~^JC(y)dy for 1 < j,k,< + 1 is defined in A5. Then, if we denote as 

r(y,x) = p(Hdy + x) 7 x,a(Hrfy + x)/x(Hdy + x) - E^p'{£la) /x(x)p(x) we obtain that 

EM„i jfc(x, a) - Mjk{x, a) = j r(y, x)y^+^“^/C(y) dy . 

Using the uniform continuity of fx, cr and p in Cq, we obtain that given e > 0 there exists rj > 0 
such that for any x G C, u G Cq such that ||u — x|| < r/ implies |/x(u)p(u) — /x(x)p(x)| < e and 
|cr(u) — cr(x)| < e. The fact that Kj has compact support [— 1 , 1 ], entails that ||y|| < y/d for any y 
such that JC{y) 7 ^ 0. Therefore, using that maxi<j<(^/ij^„ —>■ 0, we obtain that there exists no such 
that if n > no, ||Hdy|| < t] for all y such that /C(y) / 0 and H^y + x G Co- Hence, if n > no for all 
X G C and for all y such that JC{y) / 0, we have that |/x(Hrfy + x)p(Hrfy + x) — /x(x)p(x)| < e 


U/y Xr\ 




/x(u) du. 
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and |a(Hdy + x) - cj(x)| < e. Using that 7x,a(x) = EV’'(eia), (A.22) and |7x,a(x)| < H^A'Iloo, we 
obtain that for all x £ C, a ^ Is and y such that fC{y) ^ 0 

k(y,x)| < Cl supp(u)/x(u) |cj(Hrfy + x)-cr(x)l + ||'0'||oo|p(Hrfy + x)/x(Hrfy+ x)-/x(x)p(x)| 

uGCq 

< ( Cl sup p(u)/x(u) + ||i/;'||oo ) e = C2 e. 

V ueCo / 

Then, for re > no, we have that 


sup sup |EM„ijfc(x,a) - Mjfc(x,a)| 
xec a&Xs 


< C2e 


y^+"-2/C(y)dy = C3e, 


concluding the proof of (ii). 

Note that from (i) and (ii) it follows that 


sup sup |M„ijfc(x,a) - Mjfc(x,a)| 
xeC a&Xs 


which in particular leads to 


sup |iV4ijfc(x, 1) - Mjk{-K, 1)1 
xeC 


Oa 




--0, 




(A.23) 


(A.24) 


Let us prove (iii). By ( 8 ) and (A.23), given ry > 0, there exists a set M such that ¥{M) = 
0 and for any oj ^ M, there exists rei satisfying that for all re > rei, |a(j(x) — 1 | < 1/2 and 
supxgc sup^g^;^ l-^ni,ifc(x, a) — A/jfc(x, o)| < ry, with 5 = 1/2. Then, for all a; ^ and re > rei we 
get that 


sup |M„ijfc(x,a(^(x)) - A/jfc(x,af,(x))| < sup sup |M„ijfc(x, a) -M,fc(x, a)| < r] , 

xSC xGC aSiXg 

which concludes the proof of (iii). 

Let now prove (iv). Denote C 4 = sup^gc/x(x)p(x) max(l, ). Using that /x and p are 
bounded in C, we get that 

sup IA/jj(x, Oct( x)) — Mjj{'x, 1)1 < C 4 SUP |Ei/' (eao-(x)) — E'i/'(e)| . 

xGC xGC 


Using similar arguments to those considered to bound 7 x,a in (h)) we obtain that 

|Ai(o) - Ai(l)| = |E [ip'{ea) - \ = |E?/"(e 6 ')e(a - 1)| < ||C 2 ||oo^(o - 1), 

where Ai(o) = E'(/;'(ea) and 9 is an intermediate point between a and 1. Hence, 


|Ai(a) — Ai(l)| < ||C2||oo(a 


1 ) max 
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which implies that 


sup |Mjj(x,acr(x)) 1)1 < C 4 IIC 2 II 00 sup |a(^(x) - 1 | sup 

xec xec xgc 



Now (iv) follows from the fact that sup^gc l®cr(x) — 1| 0. That is, we have concluded the proof 

of (A.19). 

By (A. 17) and using that if:' is bounded and that i{p) > 0, we obtain that 


sup ||M„2 (x)|| < 
xec 





1 

sup — 
xGC ^ 


J]|/Ch,(X,-x)| . 

i=l 


Using that /(x) = (1/n) “ ^i) converges to /x uniformly and A2, we obtain that 

supxgc /(x) = Oa.s. (1), so we obtain (A.20) and the proof is concluded. □ 


A.2 Proof of Theorem 4.1. 


We begin by proving the following Lemma which will be useful in the proof of Theorem 4.1. 

Lemma A.2.1. Assume that AO, A2, A7 and N1 to N5 hold and that the function X{a) has 
bounded Lipschitz continuous derivatives up to order £ — 1, in a neighbourhood of 0. Let x be an 
interior point of Sj. Define 

Ai,„(x) = i V/CH,(Xi -x)p(X,)A Xi,„ , 

^ i=l \ ^ J 

where f?(Xi,x) = Y^jjLa{9jiXi,j) - gjixj)} + Ra{Xi^a,Xa) and Ra{Xi^a,Xa) is given in in (A.l). 
Then, we have that 

EAi,a(x) = ^^^^/i^+^p(x)/x(x)^—+ iv„(x), 
where sup^g^^ ||i^n(x)|| = h^'^^o(l). 

Proof. Using that A is f — 1 times differentiable, a (f — l)th order Taylor’s expansion together 
with the facts that A(0) = 0, A'(0) = EV’^(ei) = Ao(V’) entail that 


fi(u, x) 


a 


£-1 




k=l 


f?(u, x) 


cr 


+ 


A(^-i)(0(u)) - A(^-i)(0) /Riu 


(i-iy. 


.X 


cr 


>)(»)& 

cr k\ V cr 


k=2 


+ 


1 


(f-1) 


■A(u,x) 


f 2 (u, x) 


cr 


£-1 




where A(u, x) = A^^ ^)(0(u)) — A^^ ^^0) with 0(u) an intermediate point between 0 and i7(u,x)/cr. 
On the other hand, we also have that |A(u,x)| < ( 710 ( 0)1 < (7|i?(u,x)|/cr since A*-^ is Lipschitz. 
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Note that N2 and the fact that hj = h if j ^ a and -R(x, x) = 0 imply that 


ii(x + H,u,x) = 


S! 


I “-J 




afiQafixj) ^ 
l\ 


j^a s=l j^a 


,, ,+i ar^\x^) , , 

+ “ ((? + !)! “ + “ (g + 1 )! 


(A.25) 


Then, we have that 

]e(Ap„(x)) = E 


/Ch,(Xi-x)p(Xi)A 


i?(Xi,x^ 


fj 


"4o('0) A I 'sp A(^)( 0 ) 


Xl,a 

1 


CJ 


fc =2 


k\a^ 


{I — 1)\ ^ 


^li.n 


where for A; = 1, ...,£— 1 

^lk,n ~ 


1 


hryh’^ ^ 


-E 


ri'^- 

. 5=1 


^1,S ^8 


)p(Xi)i?'=(Xi,x)xi,, 




hs J 

J /C(u)u(x + Hrfu)i2*^(x + HrfU, x)uq, du 

Xi s Xg \ 


1 

hryh'^~^ 


-E 


IfA. 

, 5=1 


)p(Xi)A(Xi,x)i2^-i(Xi,x)xi, 


= J /C(u)n(x + Hrfu)A(x + H^u, x)i?^(x + H^u, x)ua du , 

where n(u) = p(u)/x(u) and = (1, , Ua)^ G Using an £—th order Taylor’s expansion 

of n(x + H^u) around x, we get that 

v{x + Hdu) = n(x) + y D^v{x)h^u’^ + Y ^ - Z)'"n(x)l (A.26) 

/ ^ ^ m' 


0<|m|<£ 


|m|=U 


where we have used the notation of Bourbaki for the expansion, h = (/ii,..., where hj = h 
for j / a, m = (mi,..., m^) with m* G N, |m| = mj, u™ = nj=i 

.. ayU 

Using (A.25) and (A.26) in Aifc^„, k = 1, ...,£, we obtain that Au^n can be written as Aii^„ = 
Aii,j,n where 




‘-ll, 2 ,n = V[x) 


_ . d 

Yaj{xj)h Y\^tMujUadu 
t=i 


du 


‘-ll, 3 ,n — 


e-i ^ _ f d 

E E -^D'^v{x.)Yaj{xj)hh'^ Y\_d<t{ut)uju'^Uadu 

k=l \m\=k jj^a t=l 
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All,4,n 


All,5,n 


All, 6 ,n 


All,7,n 

All, 8 ,n 


All,9,n 


Aii 40 ,n 


Aii 42 ,n 


Aii^lS^ri 


Aii^14^„ 


All,15,n — 


All,l6,n = 


All,17,n — 

All,l8,n = 


k=l |m|=/c ^=1 

fY{Kt{ut)uju’^Uadu 

|m|=£ ' t=l 

^ ^ (Xa)fe^+^h"^ J JJ Kt{ut)u'i+^U^Ua du 


\m\=e 


X] djixj) h —h'^ / Y{Ktiut)uju’^Uadu 

\m\=ij^a “■ 4=1 

|m|=£ > J i=l 


= ll X 


E E l\{Kt{ut)ufno,dM 

j^am=2 t=l 

= '^(^) { E /n ^i(^4)«iU„ du 

7^Q t=l 


du 


A:=l |m|=fc j^am=2 t=l 

= E E ;^o"’»wE^'>'"/[9f«.)-4'’fe)in -K'i(iii)ii^u™UQ, du 

k=l |m|=fc . 77^0 4=1 

1 


|m|=£ jT^o 


4=1 


1-1 


EE 


fc=l |m|=/c 

1 1 


(7 + 1 )! 


a 

- a^a^^Hxa)] ]]_ Ktiut)u'i+^U^Ua du 


4=1 

d 




\m\=£ 


4=1 


£ d 

E ;s"“”<’‘>EET^sf+J+i'”/nA',(u,)«ru'”u„du 

|m|=£ j^am=2 4=1 


jT^a m=2 

1 ^ 1 
1 1 (m) 


d 


E ;;7 EE 77+- o-.(x) n dCi(t4i)i4™u'"uQ du 

|m|=£™+yam=2"^- 4=1 
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with an intermediate point between x and x + H^u. The fact that fKj{t)tdt = 0 entails 
that = 0. On the other hand, using that Kj = L is a kernel of order i, if j ^ a, we get 

that J’/C(u)ujU™UQ du = 0 for |m| < £ — 1. Moreover, we also have that J/C(u)ujU'”uQ du = 0 
if |m| = i — 1 and m ^ {i — l)ej. On the other hand, using again that L is a kernel of order i, 
we obtain that f /C(u)u™tta^^Ua du = 0 for all m with at least one component mj ^ 0, for j ^ a. 
Thus, Aii, 9 ,„ = 0. 

On the other hand, we have that 

All,2,n ^ ^ j Ka(Ua)u‘^^UadUa 

Since Z)"^u(u) with |m| = A: and are continuous and bounded functions and gj is I times 

differentiable, with bounded derivatives for all j ^ a, it follows that 

sup ||Aii, 3 ,„|| = d^O(l), sup ||Aii, 4 ,„|| =/i;^+^ 0 (l) y sup ||Aii,io,n|| = ^^ 0 ( 1 ) ■ 

xG5q xG<Sq xG<Sq 


Similarly, using that the kernels are even we have that /nti Kt{ut)ujU^Ua du = 0 when m has 
a component different from a and j different from 0. Moreover, when mg = 0 for s ^ j, a, using 
that L is a kernel of order i we get that the integral equals 0 except when rrij ^ I —\ and = 1 - 
Arguing similarly with lULi Kt{ut)ul^^u^ua du, we get that 


All,5,n 

All,6,n 


d^v{u) 


a - 1)1 dr^-^Ou, 


h^h, 


I _oO+i)f^ uQ+i 1 


du 


U=X 

1 ^. 


(d + 1 )! 


j^a " “ “J 


, jJC{u)UjUano 

hj f JC{u)u'^^U^jUadu 


(Q+iy- 


K 






which implies that 

sup ||Aii, 5 ,„|| = h^ha 0(1) sup ||Aii,6,n|| = + hi) 0 ( 1 ). 

xG5q x;e<SQ 


On the other hand, 0™u(u) for |m| = k < i is uniformly continuous, so using that and L 
have compact support in [—1,1] and that is an intermediate point between x and x + H^u, we 
have that 


sup 

Xg<SQ 




0 "^u(x) 


0 ( 1 ) 


which leads to 


sup ||Aii, 7 ,,fi|| =/i^o(l), sup ||Aii 3 ,„|| = d^+^o(l) and sup ||Aii,i 8 ,n|| = d^O(l). 

xG<Sq xG<Sq xG<Sq 
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Similarly, using that is uniformly continuous and bounded, we get that sup^g^^ \gf\^j) — 
gf\xj)\ = o(l) which implies that 


sup ||Aii,ii,„|| = /i^+^o(l) 

sup 

Aii,i2,n 

= h^O{l) 

xG5q 

Xg<SQ 



sup ||Aii 43 ,„|| =h\h + ha)o{l) 

sup 

Aii,i4,n 

= h^o{l) 

X^Sq 

xG<Sq 



sup ||Aii 45,„|| = /i^+^o(l) 

sup 

Aii,i6,n 

= 

xG5q 

xG5q 



sup II All, 17 ,nil = h^O{l) 

sup 

Aii,i8,n 

= h^o{l). 

x&Sq 

xG5q 




Using (A.25) and the fact that, for j ^ a, Kj = L is a kernel of order i and that 
using analogous arguments, we obtain that for all k = 2,... ,i — 1 

sup ||Aifc,„|| = hl+^o{l) . 
xG5q 


Let Au^n,s indicate the sth coordinate of Ai£^„. Using that |A(u,x)| < C'|i2(u,x)|/cr, Kj has 
support in [—1,1], u is bounded and (A.25), we get that, for s = 1,... + 1, 


sup < sup / |/C(u)||?;(x + Hrfu)||A(x + HrfU,x)| |ii(x + HrfU,x)|^ ^ 

xG<Sq xG<Sq J 


< 


c 


a 


|/C(u)||u(x + Hrfu)| |i?(x + HrfU,x)|^(iu < C 2 {h + h‘^^^Y = o , 


thus. 


sup ||Ai£,„|| = /i^+^o(l) . 

xe^Q 


Hence, using that ha ^ 0 and /i —)• 0, we get that 


EA,,„W = AMa„,„ + W 

a ^' 


e-i 


k=2 


A(^)(0) ^ 1 




X 


where sup^g^^ ||^'n(x)|| = h^O{l) + h^^o{l) = /i^^^o(l) and the proof is concluded. □ 


Proof of Theorem 4.1. The proof will be carried out in several steps. In a first step, we will 
show that it is enough to assume that, since the scale estimator has a root—n rate of convergence, 
it is enough to prove the result in the situation in which scale is known to obtain the conclusion 
of Theorem 4.1. In a second step, we obtain an expansion for the estimator computed when scale 
is known into two terms. The hrst one will converge to the asymptotic bias and the second one to 
a centered normal distribution from which the conclusion follows. To obtain these two last results 
some intermediate approximations will be needed. 
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step 1. For any s > 0, define x, s) = x, s), ... , x, s)) where 


^n,a(b,X,s) = 

^ i=l 
n 

= -Ev- 

n i ^ 


Yi-bo- El=l bmiXi,a - XaV 


2 = 1 

" ^y,-xTHrfb' 


^HaiXi - x)6iXi^c 


x)(5jXj Q,. 


2=1 


Using that ^/n{'s — a) = Op(l), p is Lipschitz and ({u) = up'{u) is bounded, it is easy to see 
that for j = 0, Dn,j = sup^g^^ supb |S^*^„j(b, x, s) - „j(b, x, c7)| =Op(l/Vn). On the 

other hand, /3(x) is a solution of (6) with s(x) = s, that is, ^(/3(x), x, s') = Og+i, which implies 
that 

^n,a(3(x),x,a) = Op(l/Vn) (A.27) 

Denote as /3(x) the solution of 'J'* „(b, x, a) = Og+i. Then, Proposition 3.1 entails that 


SUPxeSg 


h(“) 


3(x) - /3(x) 


get that Dn = sup^g^^ 


h(“) 


h(") 


= op(l), so using that sup^g^^ 

/3(x) — /3(x) = op(l). We will further show that 

Dn = Op(l/v^) . 


/3(x)-/3(x) = op(l), we 


(A.28) 


To prove (A.28), denote as 


Di,4x,0 =- Tp 

a n 


11 " /y-x,T„H(“)^' 


2=1 


G 


AHd(Xj - x)(5iXj_„x7„ . 


Then, a hrst order Taylor expansion and the fact that ^(/3(x), x, cj) = 0 lead us to 

^n,a(3(x),x,o-) = Tf*^c^(3(x),x,cr) +Dp„(x)H(")(3(x) -3(x)) = Dp„(x, (3(x) - 3(x)), 

(A.29) 

where = ^^(x) stands for an intermediate point between /9(x) and /9(x), so sup^g^g [^^(x) — 

/3(x)]|| = op(l). Denote as Ao(V’) = IE(V’^(£)) and 

Do(x) = - -Ao(V')p(x)/x(x)S^") . 

G 

Then, using that from N3b) is non-singular, infxgc /x(x) > 0, infxgcp(x) > 0 and Ao('i/’) / 0 
we get that infxgc (Do(x)) > 0, with i^i(A) the smallest eigenvalue of the matrix A. Hence, 
(A.27) and (A.29) implies that to show (A.28) it is enough to see that 

sup ||Di,„(x,^„) - Do(x)|| = op(l). (A.30) 

xG5q 

We get that fTDp„(x,^„) = Dii,„(x,^„) + Di 2 ,n(x) + Di 3 ,„(x) where 


ll,n(x, 4„) 

1 

= -E':H.(Xi- 
2=1 

Dl 2 ,n(x) 

1 ” 

2=1 

Dl3,n(x) 

1 "" 

= -;;E':hPx, 


, /y,-xT„H(“)/3(x)\ /y,-x?;„H(“)e 


a 


-P' 


a 


,(Y,- xl^U(-)pOQ\ _ /Y^ X?;„H(“)/3(x) 


a 


a 


Xi,aX^a 


/y,-xT„H(“)/3(X, 


2=1 


G 
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Using that V'' is Lipschitz, sup^g^^ ||H(")[^„(x) - /3(x)]|| = op(l), sup^g^ ^”=1 - x)|/n = 

Op(l), sup|^j^^(x,_x)|^o |xi,a| < 1 we obtain that, for 1 < j,m < q + 1, sup^g^^ \Dii^n,j,m{^,$n)\ = 
op(l), where Dii^n,j,m{'^,^n) is the (j, m)—th element of matrix Dii^„(x, 

On the other hand, from the bound sup|^^^(Xi-x)|^o “ /^W)l — 

for 1 < j,m < g + 1 we get that sup^g^^ |5i2,n,i,m(x)| = op(l). 

Finally, Lemma A.1.2 entails that, for 1 < j, m < g +1, sup^g^^ |5i3,n,j,m(x) -lEi3i3,„j,m(x)| = 

op(l), while standard arguments allow to show that sup^g^^ \KDi 3 ^n,j,m{^) — -Doj-,m(x)| = op(l), 
concluding the proof of (A.30) and so that of (A.28). 

Observe that since the first element of the diagonal matrix equals 1, we have that 
ga,Mq,o.iXa) = J elP{Xa,Ug_)qa{Ug_)dUa = j ej U„)ga(Ua) . 

On the other hand, /3(x) = (5r(x), , 9a\xa)Y, so using (2) we get that 

j e[H(“)/3(x)go(Xa) dx„ = j g{x)qa{Xa) dx^ = ga{Xa) , 

which implies that 


Vnha{ga,Uq,c(.^a) “ dal^Ca)) “ V^ha J ejH(“i[/3(x) - /3(x)]q'„(X q) dXg 
■\/nha J eiH("i[3(x) - 3(x)]g«(x„) dx„ < y/h^y/nDn 0 . (A.31) 


Let us denote as gaixa) = Hi"i/3(x)(7a(xa) dx„. Then, (A.31) implies that to obtain the 

asymptotic distribution of y/nha{ga,Mq^ai^a) ~ gaixa)) h is enough to derive that of 

idaixa) - ga{Xa)] = J GiH^^^^Cx) “ (3{x)]qa{Xa) dx„ , 

that is, we have reduced the problem to obtain the conclusion of Theorem 4.1, when the scale is 
known. 


Step 2. Using that T'* ^(/3(x), x, cj) = Og+i and a first order Taylor’s expansion of „(b,x, cr) 
around /3(x), it is easy to see that 

[3(x) - /3(x)] = a A(((;^(x) Ap„(x) (A.32) 
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where Ao,n(x) = Aoi,n(x) + Ao2,n(x) with 


Aoi,n(x) = - ^ 

^ • 1 V ^ 

2 = 1 \ 


Xi,aX,^„ 


\ fl - x)V^' Te, + 
2 = 1 ^ 


X,-ryX. 


Ao 2 ,n(x) = - — 2^(5iAHd(Xi - xjV’ I - 

” i=l V ^ 

1 A - x,^„H(“)/ 3 (x) 

Al,n(x) = - ^(5jAHrf(Xi - x)V^ f -!—- 

2 = 1 \ 


I jj(a) 


/3(x) - /3(x) 


Xi, 


where 0(x) is a midpoint between H*'")/3(x) and H(“^/3 (x). Denote as u(u) = p(u)/x(u) and 
Ao(u) = v{u)Aq{'iP)S^°‘\ Lemma A.1.2 allow to show that sup^g^^ |Aoi,n(x) — Ao(x)| = op(l). 

On the other hand, the fact that ip" is bounded, sup^g^^ /3 (x) — /3(x) = op(l) and 

that that each component of Xj^Q, is smaller or equal to 1 when /CHd(Xj — x) 7^ 0, imply that 
SUPxe^Q |Ao2,n(x)| = op(l), so SUPxg^Q |Ao,n(x) - Ao(x)| = op(l). 

In Step 2.1, we study the asymptotic behaviour of 


Bn = a-s/nha / Aq ^(x)Ai,„(x)g„(x„) dx„ 


and we show that 


bg, 0 ( 3 ^ 0 ) — 




( 9 T 1 )! 


D 


-^g+l Sg,^Q-(XQ-)) , 






,Eip‘^{e) 


/; 


qIM 




(A.33) 

(A.34) 

(A.35) 


^o(V’) J fxiXa,:>i-q)p{Xa,^a) 

We will then show, in Step 2.2, that 

[gaixa) “ ga{Xa)] “ GiB^ = ay/uha j 6^ (^AfJ^^(x) - Aq ^(x)^ Ai,„(x)g'o(Xa) dx^ = Op(l) , 

which together with (A.33) concludes the proof. 


Step 2.1. Recall that Yi — x.j^QH^“^/3( x) = aci + R(Xj, x), so that 



E { Ip 


Yi - Xi,„H(“)/3(x) 


iXi ^ = A 


a 




(A.36) 


Dehne Ai^ji(x) as in Lemma A.2.1, i.e., 

Ai,„(x) = l^/CH,(Xi -x)p(X,)A 


2=1 


X,-, 
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and note that (A.36) entails that ]EAi^„(x) = EAi^„(x). Moreover, we have that B„ = + B „^2 

where 


B„,i = j AQ^{x)Ai^n{x.)qa{xa)dxa 

Bn,2 = cry/nha j Aq ^(x) Ai,n(x) - Ai,n(x) qa{Xa)dXg 

Then, to derive (A.33), we have to show that 

a) Bn,i ^ hq^aixa) with hq^a{xa) given in (A.34) . 

b) Bn ,2 A"q+i(0, 5]q,Q,(xQ,)) where Hg^alxa) is defined in (A.35). 


a) To show that Bn,i ^q,a-, h is enough to see that EBn,i —>■ ^q,a and that for all 1 < j < g + 1, 
VAR(Bn,l,i) ^ 0. 

Lemma A.2.1 together with the fact that Ao(u) = n(u)Ao('0)S(“) and = /3(2'J+3)/2 

entail that EBn,i = ay/nha f Ao(x)“^EAi^n(x)(?a(xa) dxa = Bn^n + Bi2,n, where 

Bn,n = 

Bl2,n = y/ndi^ j AQ^{x)E{lJn{x)} qaixg_)dxa. 

Hence, EBn,i —)• bg^Q, since sup^g^^ ||^'n(x)|| = ht^^o{l) and y/nh^htt^ = /5(29+3)/2 gj^i^ail that 
Bl2,n 0- 

We will now show that VAR(Hn,i,j) —)• 0, for 1 < j < <7 + 1. denote as B = Ao(i/’)S*'"^Bn,i /(7 
and Bj its j—th component. Then, it will be enough to show that the variance of Bj converges to 
0. Note that Ao(u) = '(;(u)Ao(V')S^“^ implies that 



where = diag(/i,...,/i) G i)x(d i)_ Thus, using that ijj is bounded, qa is continuous 

and bounded, we get that |C(H(i_i,Xj,X q,)! < C, for all i. Since p < 1 and \Xi^a — Xa\ < ha if 
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Ka ((-^i,a “ Xa)/ha) / 0, we conclude that 


VAR{Bj) = -^Var [ 
tl/y V 




< —E 

h/y 


K' 


hoL ) 

2 / ^l,a A 2 




^l,a ^0 ^ ^ 

hrv 


hn 




hrv 


20-l)‘ 


- hr. I V k 


2 / ^ct ^cx. \ ^2 


c (H^_i,v,Xa)/x(v)dv 


— J' (Xq, +V^), Xq,)/x(3^ck “hV^) dWo-dVc 


Then, from the dominated convergence theorem it follows that VAR(i?j) —>• 0 since —)• 0 when 

n —)• oo and (^^(Orf_i, (x^, v„),Xq) = 0, since i?(x, x) = 0 and A(0) = 0, concluding the proof of a). 


b) Let Bn ,2 = (Ao(V^)/a)S(“)B„, 2 . To obtain b) it is enough to see that, for any c S c ^ 0, 


c^Bn ,2 = Ya=i iV(0, c'^Sn(x„)c), where 


Sii(xa) = E'0^(e) 


9a (Xa) 


/x(xQ,,Xa)p(Xa,Xa) 

Denote as Hrf_i = diag(/i ,... ,h) G and 

'i?(X„x) 


dX-rv^lr\ . 


y(ej,Xi,x) = Siijj ( Ei + j _ p(Xi)A 


a 


a 


7(ej,/i,Xj,Xa) = 


1 f 1 




h^-i J u(x) ^ 


Xjj- Xj 


(xq ) F (cj, Xj, (x«, x„)) dxc 


lUa) j-j Xj{uj)V{ei,Xi, (x„,Xi,„ + Hd_iUa))duQ , 
(x„,Xi,« + Hrf_iU„) Jy*- ^ - - - 


w = _ 


=ii:n 


Xj Q, — Xq, \ ^ /Tv \ 

-- I xyQ,7(e, h, Xj, Xq,) . 


Note that 7 (ej, 0, Xj, Xq) is well defined as 


7(ei,0,Xj,XQ) = 


9a (Xj 


x(xq, Xj q) 


(e/) Xj, (xq, Xj^q)) 


(A.37) 


It is clear that 


Aijn(x) Ai^^(x) — ^ Xhjj(X j x)xj^Ql^(ej, Xj,x), 


i=l 


hence. 


B 


n,2 — 


Vnha jv ^(x) Ai,„(x) - Ai,„(x) qa{Xa)dXa 
= iz^<- Xj,Q7(e„h,X„XQ) = 


/n/iQ 


2 = 1 


2 = 1 
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Let c G c 7 ^ 0. Since E(y(e*, Xj, x)|Xj) = 0, for all x, we have that = 0 so = 

0. Besides, as '0 and p are continuous functions and |(5i| < 1 we have that |B(ei,Xj,x)| < C for 
some constant C > 0 which entails that 7 (ej, h, 'Ki,Xa) is bounded since infxg^g u(x) > 0 and is 
bounded on its support. Therefore, using that < 1 when ~ x) / 0, we obtain that, 

for some general constant Ci > 0, 


< Cm 

< C2 


1 


2=1 


{nha)^/‘^ 

1 


E 




^i,a Xq 
hrv 


= c, 


1 1 
y/nha ha 




U — Xr 


hn 


fxM du 


y/nha 


0 . 


Hence, applying the Lyapunov’s central limit theorem to the triangular array of independent vari¬ 
ables the proof of b) follows if we show that lim^^co Var(c^ SiLi '^i,n) = c^Xii(xa)c 

or equivalently that lim„^oo Var(X;"=i = Sii(a;a). 

Using that Wi^„,..., are independent and that EWi_„ = 0, we get that Var(^”^^ Wj^„) = 
nVAR(Wi^„) = nE . Given 1 < s,m < g + 1, denote as Egm = nK{Wi^n,sWi^n,m) where 

lUi,n,m is the m—th component of Wi^„. We have to show that Esm converges to the (s,m)—th 
element of Sii(xq,). 


Let M{h, u, Xq) = E 7 ^(ei, h, u, Xq,)|Xi = u 


then we have that 


E_qrn. — 


hr] 


-E 




2 [ ^l,a Xc 


a \ 2 


he 
— \ 


7^(ei,/i,Xi,x„) 


^l,a Xa 

ha 

s+m—2 


s+m—2 


/xK.U„),iu 

J+ Xa,Ug_),Xa)fxiUaha +X„,U„) du . 


(A.38) 


Note that (A.37) implies that M(0,u, a:„) is well defined and equals 


A/(0,u,Xq,) 


qIM 

v'^{Xa,Ug_) 


E [U^(ei,u, {xa, 


w))|Xi 



(A.39) 


Hence, taking limit in (A.38) and using the dominated convergence theorem, we get that 


lim Esm= Kl{Ua)u^o^'^ ‘^M{0,{Xa,Via),Xa)fx{Xa,'aa)du. 
n—>-oo J — — 


Using that 


E 


) |Xi 


a 




cr 


we get that 

E [U^(ei,u, (x«,Ua))|Xi = u] =p(u)A 2 (i2(u, (x«, u„))) -p^(u)A' 


2t n\2 / -^(u, (a:a,u„)) 


a 


(A.40) 
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where A 2 (a) = ]E?/;^(e + a). Then, the fact that i?((xQ,,UQ), (xQ,,Ua)) = 0, A(0) = 0 and A2(0) = 
E^^(e), together with (A.39) and (A.40) entail that 


M(0, {Xa,Ua),Xa) = 


qIM 

v'^{Xa,Ua) 


p{Xa,na)E'ljj‘^{e) 


Hence, we have 


r F _ \ s+m-2r I , ^ 

lim EgYTl - / Efy^yUfyjllQ^ f'X.yXfy^'^a) 

n—>-oo / — 


= IKip'^ie) j 


qii^g 


/x(xq,,u«)p(x«,u„) 


v‘^{Xa,na) p{Xa,na) 


EV>^(e) 


where vim is the 


^s,m)th element of the matrix Vq, concluding the proof of b). 


Step 2.2 To conclude the proof, we have to show that 

idaixa) - Qaixa)] - = a^/uK^ Jej (^Af^)^(x) - Aq ^(x)^ Ai^„(x)g„(x„) dxa = 

Note that (A|^,',^(x) - Aq ^(x))Ai^„(x) = Di(x) + D 2 (x) with 
Di(x) = (A();),(x) - Ao^(x))EAi,„(x) 


(A.41) 


D2(x) = (Aq )^(x) - Aq ^(x)) (^Ai,„(x) - EAi,„(x 

We will show that, for all 1 < j g + 1 


ynha sup |iAij(x)| 0 

x&Sq 

s/nha sup |-D 2 ,j(x)| — 

xe^Q 


0 , 


(A.42) 

(A.43) 


where Dij{x) is the jth coordinate of D£(x), ^ = 1,2, which entails that (A.41) holds concluding 
the proof. 

Fix 1 < J < q+1. In order to prove (A.42), observe that Lemma A.2.1, the fact that EAi^„j(x) = 
EAi^„j(x) and the Cauchy-Schwartz inequality entail that 


sup |L>ij(x)| < sup 

xSSq xSSq 


(Ao,n(x) - A, 


- 1 ' 
0 . 


0{ht') 


where the term o{h^^) does not depend on x since v and are bounded. On the other hand. 


since 


SO) is non-singular, infxg^Q |r'(x)| > 0, Aq{'iP) / 0 and sup^g^^ Ao,n(x) — Ao(x) 


we get that sup^g^^ 


^0,n 

P 


X) - A 


-1/ 


0. Hence, since y/nhah^^ = /3(2<?+3)/2 


y/nha supxg^Q |llij(x)| 0 for all 1 < J < (? -|- 1, SO the proof of (A.42) is concluded. 
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To prove (A.43), we will use Lemma A.1.2 with 9n = y logn/(n/i„/i'^ i) applied to each coor¬ 
dinate of vector Ai^„(x) = (Ai^„_i(x),..., Ai^„^q+i(x))'^ obtaining that, for 1 < j < g + 1, 


sup 

xG5q 


Al,nj(x) -EAi,„j(x) 


= Op 


logn 

nhnh‘^~^ 


1/2^ 


On the other hand, as above, from Lemma A.2.1, we get that sup^g^^ EAi^„j(x) = 

Then, using sup^g^^ Aq^(x) — Aq ^(x) 0 and infxeSg i^i(Ao(x)) > 0, we conclude that 

supxg^Q t'g+i(AQ ),^(x)) = Op(l). Hence, using (A.32), we obtain that 


sup 

xGSq 


H(“l|^^(x) -/!(x)||| < »,(/»»+') +0, 


(A.44) 


Let /C*(u) = |/C(u)|/ j'|/C(u)|(iu, then. Remark A.1.1 implies that /(x) = (1/n) “ ^i) 

converges uniformly and almost surely to fx (see (A.8)). Hence, using A2, we obtain that 
supxgc/(x) = Oa.s. (1) which together with the fact that tp" is bounded and that each compo¬ 
nent of Xj^Q, is smaller or equal to 1 when JCii^(pX.i — x) 7^ 0, leads together with (A.44) to 


sup ||Ao 2 ,n(x)|| < C sup 
xGSq xG<Sq 


h(")[/3(x) - /3(x)] sup/(x) < op(/i^+^) + Op f 


1 /2N 


<7 + 1 


which together with the fact that ha = /?re 25+3 and n 29+3/1“ ^/logn —)• 00 implies that 


sup ||Ao2,n(x)||Op 
x€Sq 


logn\ 


1/2^ 


9+1 

< n 29+3 


< op(l) 


log re \ 
/i“-V 


1/2 


Op (1) + 


log re 


(7 + 1 ^ 
n'^q+3 


Op (1) 

(A.45) 


Recall that sup^g^^ req_|_i(Ao(x)) < 00, since infxe^Q rei(Ao(x)) > 0. Therefore, using the Cauchy- 
Schwartz inequality, the fact that Aq ^(x) — Aq;'j(x) = AQ)j(x)(Ao,n(x) — Ao(x))Aq ^(x), Ao,n(x) = 
Aoi,n(x) Ao 2 ,n(x) and supx;g5Q req+i(A(^^(x)) = Op(l), we get that 

\/nha sup |i) 2 j(x)| < crOii/re^ sup ||Ao,n(x) - Ao(x)|| sup Ai,„(x) - EAi,„(x) 


xeSq 


xGSq xG5q 

< O2 sup ||Aoi,n(x) - Ao(x)||Op I 


xG5q 

+C 2 sup ||Ao2,n(x)||Op 
xG5q 


/l“- 

logre\ 

/i““^ / 


< O2 sup ||Aoi,n(x) - Ao(x)||Op 
xG<Sq 


logrey/^\ 

pir) 


(A.46) 


where the last inequality follows from (A.45). 


39 


































Recall that Ai(t) = Ei/''(ei + 1) and Ai(0) = Ao{'ip). Denote 


^l,n 


x) = EAoi, n(x) = E 


/CH,(Xi-x)r(Xi)Ai 


Xl,aX^, 


= y/C(u)n(x + Hrfu)Ai 


R(x + HrfU,x) 


a 


UqU^ du. 


Let Aoij,m(x) be the (j, m)th element of matrix Aoi,n- Then, analogous arguments to those 
considered in the proof of Lemma A.1.2 allow to show that, for 1 < j, m < + 1 


sup 

xGSq 




= Of 


log n 
nhn,h’^~^ 


1/2N 


Hence, 


sup ||Aoi,n(x) - Ap„(x)||Op 

xGSq 


lognA 


1/2^ 


<?+! /logn 
< n 25+3 


Op (1) = op(l) 


(A.47) 


Hence, (A.46) and (A.47) entail that to conclude the proof of (A.43) we only have to show that 

dogn^ 


sup ||Ai,„(x) - Ao(x)||Op 

xG5q 




= Op(l) 


(A.48) 


Denote as Aoj',m(x) and Ai^„j^m(x)the {j,m) element of Ao(x) and Ai^„(x), respectively. Then, 
using that v is i times differentiable, Ai = A' is £ — 1 times differentiable, the kernel L is of order 
£ and that R{x + HrfU,x) = since gj 

and ga' are continuously differentiable functions, we obtain that, for 1 < j,m, < g + 1 

'R(x + HrfU,x) 


sup |Ai,„j,m(x) - Aoj,m(x)| = sup 

xGSq xG<Sq . 


/C(u) 


n(x + Hrfu)Ai 


cr 


i?(x, x) 


a 


-n(x)Ai 

< O2 (h^ + < Oah^+i 


Uj_^OiUm^a dn 


which allow to conclude that, for 1 < m < g + 1, 


sup I -Aq (x) I Op 

xG<Sq 




1/2N 


which combined with the fact that log ^) —)• 0 conclude the proof of (A.48) and 

also that of the Theorem 4.1. □ 

Proof of Theorem 4.2. The proof follows using similar arguments to those considered in the 
proof of Theorem 4.1, noting that 


gaLAxa)-g^J;'Hxa) 


= 1/1 I e 


= 1/lh- 


u+1 


/< 


0{Xa, Ua) - P{x a, U«) g«(Ua) dUo 


't/+l 


h(“) 


(3{Xa, U„) - P{Xa, Ua) ga(u„) . □ 
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